Refactoring Varisat: 5. Incremental Solving and Proofs
This is the fifth and final post in my series about refactoring varisat. In the last post varisat gained the heuristics needed to solve some non-trivial instances. In this post we’ll add incremental solving and proof generation. This brings varisat to feature parity with the old version.
Incremental solving enables new use cases that just aren’t feasible with batch solving. Building SMT solvers on top of a SAT solver or techniques like symbolic execution often generate a vast amount of related SAT solver queries. Starting a new solving process for each query would result in a lot of repeated work. Incremental solving avoids some of this.
Proof generation makes it possible to independently check the work performed by a SAT solver. State of the art SAT solvers have a large codebase where it’s easy for a bug to hide. When the SAT solver finds a solution it is easy to tell whether it satisfies the input formula. For the unsatisfiable case, without proof generation, we have to trust the solver to be free of bugs. As solvers tend to have bugs from time to time, the yearly SAT competition started to require proof generation a few years ago.
Being able to generate proofs isn’t only helpful for uses who want to have better assurance of the solver’s correctness, it’s also great for finding and analyzing bugs.
It would be great if we can arbitrarily change the input formula and the SAT solver does as little work as necessary to solve the changed formula. If we look at how a SAT solver works we can see that some formula changes are easier to handle than others. At every point all internal clauses of the solver are implied by the input formula. This doesn’t change when new clauses are added to the input formula, so everything derived so far is still valid. This means we can just add the clause to the internal clause database. To keep all internal data structures valid we backtrack before adding the clause and make sure to watch two non-false literals of the new clause. If no such two literals exists the clause is already propagating or in conflict which also needs to be handled. This is implemented by extending the same code that loads the initial formula to also handle incremental additions.
While clause addition is not too complicated, clause removal is a different story. When a clause is removed some of the learned clauses derived from the removed clause may become invalid. Figuring out which clauses are not justified anymore would require keeping track of the complete derivation of each learned clause, not only the current set of clauses. In theory that would be possible, but it would slow down solving and I think it would remove too many learned clauses most of the time. In practice only clause additions are supported. It turns out there is a simple way to emulate clause removals by introducing new variables and solving under assumptions.
Solving Under Assumptions
Solving under assumptions is a feature that allows us to search for a solution with a subset of the variables fixed. This is equivalent to adding some unit clauses to the formula before solving and removing them afterwards. The difference is that these unit clauses are never added to the solver’s clause database and aren’t used to derive new learned clauses. Instead the recursive search is modified to always use the assumptions as first guesses after each restart. Either we will find a solution that matches the assumptions or we will eventually learn a clause that is falsified by the assumptions alone. This requires no modification of the core CDCL algorithm, as we treat the assumptions just like the guesses we make when propagation stopped making progress. All the infrastructure to undo those guesses is already in place and can be used as is.
Again there are some details this high level description glosses over. One of them is that when we would enqueue an assumption it might be the case that the literal is already set. There are two cases to consider here. The literal might already have the assumed value. This means the assumed literal is implied by the formula together with the already enqueued assumptions. Thus we can remove that literal from the assumptions. If the literal has the opposite value, the formula is unsatisfiable under the given assumptions. Similar to a normal conflict, we can analyze the implication graph starting from that literal to see which subset of our assumptions contributed to that conflicting assignment. Discovering that a subset of the assumptions is enough to make the formula unsatisfiable is often useful in algorithms that make use of an incremental SAT solver.
Emulating Clause Removal
Assumptions allow us to emulate incremental clause removal with the small catch that we have to decide whether a clause should be removable during its addition. This works by adding a new, unique indicator variable to each clause that is a candidate for removal. To solve with the clause active, we assume that indicator variable to be false. If we assume it to be true, the clause is already satisfied which has the effect of temporarily removing the whole clause. Even better, if we add a unit clause consisting just of the indicator variable, simplification will remove the clause permanently. If we know that we want to remove a set of clauses all at the same time, we can use the same indicator variable for all of them.
This works well as long as we’re using only a few indicator variables, i.e. either a few clauses or a few sets of clauses, each sharing the same indicator variable. In my experience, with enough assumptions, at some point it becomes faster to switch to non-incremental solving instead. I’m aware that there have been attempts to overcome this problem one way or another, but so far I haven’t looked into that.
As satisfiability of a formula is easily shown by providing a model, we only need to consider the unsatisfiable case. We want to produce a proof of unsatisfiability that is efficient to generate and can be checked mechanically. Checking should also be fast.
To ensure that the proof can be checked mechanically we cannot just allow any arbitrary sound reasoning step. Instead we need a formal system for which an algorithm can decide the validity of a derivation step.
In part 3 of this series I’ve argued that clause learning is equivalent to a number of resolution steps. Generating a resolution proof would be a possible choice. In such a proof each step derives a new clause from two previous clauses that share a variable with opposing polarities. Given the input clauses and the variable to resolve on, the output of a resolution step is uniquely determined. A resolution proof can thus be checked by computing each resolvent and verifying that it matches the resolvent given in the proof. This approach has the big problem, that resolution proofs tend to be huge. Learning a single clause often involves many resolution steps. This means a significant amount of time is spent on outputting a proof that often would be too large to handle.
This is where so called clausal proofs come into play. The idea here is to define our formal system in a way that minimizes the amount of proof output required for each learned clause. It turns out that there is a suitable formal system where deriving a clause consists of just the clause itself. This is called RUP for reverse unit propagation or AT for asymmetric tautology. A clause is an asymmetric tautology with respect to a formula if the clause’s negation together with the input formula lead to a conflict under unit propagation. This ensures that which is equivalent to . This makes adding an AT clause to a formula logically sound.
This is a useful definition for us, as every learned clause is an asymmetric tautology. A learned clause is constructed by finding a set of literals that cause a conflict. The learned clause forces at least one of them to be assigned differently. If we assume the negation the learned clause we arrive at the same conflict. This allows us to derive each learned clause in a single step with no additional information to justify it. This system is also able to emulate resolution, as the resolvent of two clauses is an AT of the resolved clauses. This means, compared to resolution proofs, we’re not limiting our choice of techniques in addition to CDCL.
Generating such a proof is done by simply writing all added clauses to a file. To speed up checking, clause deletions are also emitted. This limits the growth of the set of clauses to consider for unit propagation. This is the basis for the DRAT proof format. Generating DRAT proofs is required by the SAT competition and supported by most solvers. It uses a different system called RAT (resolution asymmetric tautology) that is required to support some techniques not yet implemented by varisat. RAT is a strict superset of AT so we can ignore the difference for now.
I think a big downside of this format becomes clear when we consider how to check such a proof. The system is defined using unit propagation and we’ve seen that an efficient implementation of that isn’t exactly trivial. Therefore it’s easy to have bugs hidden in the checker. Another downside is that checking a proof easily takes as long as solving the instance in the first place.
Unit Propagation Hints
These two problems are even more significant when developing a formally verified checker. Formally verifying software often involves trading an efficient approach with one that is easier to prove. We cannot really afford to make checking any slower, but it’d be a much easier task if checking could be done with a simpler algorithm. One way forward here is to augment a clausal proof with the minimal amount of information required to make checking trivial. This approach is taken by the proof format LRAT.
The allowed steps in the LRAT format are the same as in the DRAT format. Unlike the DRAT format where an addition step consists of just stating the added clause, LRAT also encodes how the negation of the clause causes a conflict. This is done by giving unit propagation hints. These hints consist of a list of clauses in the order they become unit during propagation, finishing with the clause that is in conflict. With this, the checker doesn’t need to find out which clauses become propagating and can avoid implementing everything needed to make unit propagation fast. To check a clause addition it starts with a partial assignment consisting of the added clause’s negation. For each clause in the hints it checks that the listed clause becomes propagating under the current partial assignment and then updates the assignment with the propagated literal. For the final clause it checks that all literals are false.
This is much simpler to implement and leads to much faster proof checking. The downside has to do with how the propagation hints are encoded. In the DRAT format a clause is specified by listing its literals. Encoding the clauses of the propagation hints like this would take up a lot of space. To avoid this the LRAT format assigns a unique increasing integer ID to each clause. When a clause is added the ID of the new clause is specified and the propagation hints use these IDs to refer to previously added clauses.
Providing propagation hints and keeping track of clause IDs makes generating LRAT from a solver more complicated. In fact the LRAT paper states this:
Producing LRAT proofs directly from SAT solvers would add significant overhead both in runtime and memory usage, and it might require the addition of complicated code. Instead, we extended the DRAT-trim proof checker to emit LRAT proofs.
Generating the LRAT proof from a DRAT checker makes sense if the goal is only to enable verified checking. I’m more interested in reducing proof checking time, though, and for that it’s not useful.
Thus in varisat 0.1 I decided to add LRAT proof generation anyway. Collecting the propagation hints during conflict analysis is relatively straight forward as you’re already visiting the involved propagating clauses in the reverse chronological order. Recursive clause minimization makes it a bit more complicated, as the involved clauses are visited out of order. In varisat 0.1 I solved this by scanning the trail and implication graph a second time after the clause was minimized.
For varisat 0.2 I realized that the extra clauses used for recursive minimization always propagate before the clauses found during initial analysis. Now I’m sorting the clauses involved in recursive minimization by propagation order and prepend them to the clauses found during initial analysis.
I think managing clause IDs is the bigger problem. This introduces memory overhead, as we need to store an ID for each clause. While 32-bit are enough to identify a clause at a single point of time, we need globally unique IDs, so we want to use 64-bit IDs. It’s also complicated by the fact that unit and binary clauses are stored differently from long clauses. There is no good way to store clause IDs with binary clauses. In varisat 0.1 I decided to only keep 32-bit IDs for long clauses, reusing IDs. This simplified the logic within the solver quite a bit. Then for proof generation I keep a map of current 32-bit IDs, current binary clauses and current unit clauses to the unique increasing IDs used by LRAT.
I never got around to measured the memory overhead, but initial benchmarking showed solving runtime overhead of around 15%. In isolation I’d agree that this is a significant overhead. If you consider the time saved during checking, though, direct LRAT generation still comes out far ahead.
Nevertheless, for varisat 0.2, I wanted to improve on this.
Avoiding Clause IDs
Generating a clausal proof with propagation hints would be much easier if there is no need to maintain clause IDs. I realized that instead of using IDs, I could simply clause hashes. I’m using a very simple hash function that is just the bitwise exclusive of hashes of the clause’s literals. This has the nice property that it is invariant to permutation of the literals. Canceling of duplicated values is no problem here, as all solver internal clauses have no duplicated literals.
When using hashes to identify clauses, there might be some conflicts where different clauses have the same hash. For propagation hints, this is not a problem. The checker can simply try every matching clause and discard clauses that do not propagate. Additional propagating clauses, cannot cause a conflict to disappear. For clause deletions, though, we need to list all literals to uniquely identify a clause. Initial benchmarking indicates that with this approach the additional overhead compared to generating DRAT proofs is less than 5% in runtime. The memory overhead is basically zero. The only downside I’ve discovered so far is that the proof files tend to be quite a bit larger than the corresponding LRAT proofs would be. I might be able to improve this a bit by dynamically adjusting the hash sizes and by using a more compact encoding.
Of course this is a different proof format now. To be useful we need a proof checker for this format. Varisat 0.2 comes with a builtin checker for this custom format. Apart from basic types for variables and literals, the checker is completely independent from the solver part. For the proof checker, it’s not too difficult to assign LRAT compatible clause IDs. Thus I added support for generating LRAT proofs to the checker. Generating an LRAT proof this way is a lot faster than generating and converting a DRAT proof.
On the Fly Checking and Proof Processing
Having a checker and a solver in the same codebase, makes it possible to combine both. This results in a self-check mode that runs them concurrently and avoids the serialization round trip. I imagine this will also be helpful for debugging, as any unjustified step will instantly give me stack trace containing the responsible code. This also enables emitting an LRAT proof while solving, without writing a custom varisat proof to disk.
Verifying is not the only interesting thing that can be done with a proof. A clausal proof can be seen as a graph of clauses and implication between them. A correct proof stays valid when removing clauses from which the empty clause cannot be reached. This is called proof trimming and might even remove some of the input formula’s clauses. In that case we can find an unsatisfiable formula that is a subset of the input, also called an unsatisfiable core. The drat-trim proof checker is able to compute trimmed proofs and unsat cores. In addition to that I have some other ideas for processing proofs that I want to try. To enable all this, varisat’s checker comes with a public API that allows for custom code to process a proof. By using the self-checking mode it’s also possible to run such a proof processor concurrently with the solver. The conversion to LRAT already uses that API and I plan to use it to add trimming and unsat-core computation.
Now that varisat 0.2 supports everything varisat 0.1 does, the next step for me is to go over the public API and improve user documentation in preparation of a crates.io release of varisat 0.2. Although this concludes the series about refactoring varisat, I’m sure that I’ll continue blogging about varisat from time to time. So if you don’t want to miss those posts, you can subscribe to the RSS feed or follow me on Twitter.