Friday, February 10, 2012

Solving Hexiom using constraints

After my first try using brute force search to solve the Hexiom puzzler, and following recommandations I received in comments, I switched to an approach using constraint solving principles. It's not yet finished (I can still think of new constraints to add), but it's already working more or less as efficiently as the search version. I will try to highlight my findings in this post.

Search algorithm

My search algorithm was quite simple:

  1. choose an empty cell
  2. look for all tiles that can possible fit and try each one in turn
  3. decide if the resulting position is solved, still open, or impossible
  4. if still open, continue at step 1.
That being said, the latest implementation already took the current position into account in step 2. and 3. For instance, when looking for possible tiles, I only tried tiles with available numbers (obviously), and in a range of values that would still be correct considering surrounding values. That last element is already a kind of constraint application, albeit a simple one.
The third step is quite powerful too, in deciding whether a position is impossible to solve. It also computes possible values and detects mismatches. It can prune out a good deal of possiblities by finding violated constraints.

Constraint solving algorithm

The algorithm is, in fact, an hybrid between constraint solving and search.

  1. When reading the input file, each cell is filled with the list of possible tiles, depending on available numbers, number of neighboring cells and locked tiles
  2. A set of constraints is used to remove possibilities from cells. This step is repeated until nothing moves any longer.
  3. Then we fall back on search, as before:
  4. choose a cell
  5. find all possible tiles and try each one in turn
  6. decide if the resulting position is solved, still open, or impossible
  7. if still open, repeat a step 2.
The tricky parts are number 2. and number 4.

Regarding constraints (step 2.), some are easy to find, some require a bit more work. I currently use three constraints, but I have thought of some more. The first constraint I use is the same as was implemented in the search algorithm: remove values that are outside the range of possible neighbors. For instance on the following figure, black cells have not been solved yet and we are trying to find which values are still possible in the blue cell. The minimum value is two, because we already have two neighbors. The maximum value is five, because we already have one empty cell as neighbor and the other ones are either filled or could be filled.

The second and the third constraints are mostly like the ones used in sudoku: when all tiles of a current value have been used, remove that possibility from all other cells, and when the number of cells that can host a given value is exactly the same as the number of available tiles with that value, fill all the cells with that value.

Step 4. is where heuristic is important: I have to decide which cell to try first and in which order to try the possible tiles. I have implemented four strategies to choose the next cell. One finds the cell with the fewest choices, the second finds the cell with the most choices, the third finds the cell with the highest possible value and the last just scans cells linearly, from top to bottom (like the search algorithm).


The current results are good but not perfect. On the one hand, with the good heuristic, I can now solve level 40 in under a second and level 38 in two hours, whereas they completely resisted the search algorithm. On the other hand, solving the 38 other easy levels is a little slower than with the search. The code is available as usual on github. The java version could probably do with some low level optimization, like minimizing the number of instanciated objects.
Among heuristics, the one that chooses cells linearly does globally better than others. I does not fit my immediate intuition. I was inclined to choose cells with the least choices, thinking I would get better chances. It seems intuition is not always the friend of science.

What's next

This puzzle is a suprisingly rich field of exploration. I have a few more directions to go.

The first is expanding the set of constraints. I'm thinking of one constraint that could help remove the "empty" possibility from cells, when it's mandatory to fill remaining cells with tiles in order to reach a given count. Another one, or the same in reverse, could force empty cells when the correct number of neighbors is already present.

The second is around heuristics. What bothers me is that several of the ones I have implemented are perfect in some situation and awful in others. I can try to find new ones. Or I can try to take advantage of multiple cores by running several solvers in parallel, with different heuristics, and hope that one of them finishes quickly.

The third is another approach to the problem. Only choosing which cells to leave empty is sufficient to determine the complete board. Then it's only a question of matching the resulting board with the set of available tiles to see if it works. I'll describe that approach later, but I'm not sure it can lead to a workable solver. I'm afraid I won't be able to prune out branches as effectively.

The fourth is about automation. If you remember, my goal was to get the 60 Kongergate points granted for solving a random size-6 board in less than three minutes. To reach that goal, I have started using to play for me. It's not yet done, but it can almost read a size-4 board. After that, it will only need to launch the solver, read the results, and move the tiles accordingly. Should be fun to watch...


Thanks to fellow redditor julesjacob who described the general principles of constraint propagation that I implemented here and gave invaluable advice and pointers to good resources. And also to leonardo_m who piqued my curiosity for the D language by porting the Java version of the search algorithm.

You can also comment on reddit. 

1 comment:

  1. Hello,

    If you are interested, here is a 10 lines CP model using Scampi CP Solver (in Scala):
    It solves the level 40 with only 2 backtracks and 75 ms.