Tuesday, January 31, 2012

Solving Hexiom (perhaps you can help)

Recently, when killing time on Kongregate and looking for achievements to boost my level, I discovered Hexiom, a puzzle game by a guy called Moonkey. As all good puzzlers, this one features a very small set of rules and gives birth to a large range of problems, with varying degrees of difficulty. The game comes with forty pre-defined levels and a "random-level" feature. To obtain 60 points on Kongregate, you need to beat the "impossible" challenge: complete a random level of size six in less than three minutes. Well, that might not be impossible, but it's darn tough. So my immediate thought was: "Ok, let's just write a solver for this thing, and pocket the 60 points!" But as of today, I still don't have those points.

The rules

So, what exactly are the rules of this game? As you can see, you have a board of hexagonal 'squares' (I call them cells), and you have hexagonal tiles in some of those cells. The tiles have a number on them, ranging from zero to six. They also have a color, but this one is changing. It's not a characteristic of the tile, as I will explain shortly. The goal is to put all tiles in cells such that each tile has a number of neighbors that exactly matches the number on the tile, as you can see below.

The number on the tile must match the number of neighbors
The color of each tile changes, just to give you a hint about its current status. If it's yellow, the number of neighbors is correct. If green, the tile does not have enough neighbors, if red, it has too many of them. A solved level is, therefore, filled with yellow tiles.

In some level, there are additional constraints. Some cells might be simply unavailable: they appear completely flat. Some other cells might have a tile already locked in, with grey 'claws'.

And that's all! You can move any free tiles to any free cell, until you come up with a correct combination.

Modeling the problem

I first defined some structures for the puzzle and a way to describe the levels in text files and to read them. A level is:

  • a board containing cells
  • a list of tiles (just the count of tiles for each possible number)
  • a set of fixed tiles, if any.
The board is initially built just knowing its size. In fact in my model, all cells are created, even those where no tile can be put. The "is neighbor of" relation is precomputed to speed up the search.
The list of tiles is filled from all the tiles present on the board at startup. I just count the number of tiles for each possible value, because tiles don't have individual identity. I have also added a special value to mean "no number." Each cell that has no tile on it at startup counts as one "no number" tile. That way, when doing the search, it's always possible to chose a tile to fill a cell, even though it's not a real visible tile.
The set of fixed tiles is simply modeled as a partial solution: some of the tiles already have a destination. Something I noticed is that unavailable cells can simply be modeled as cells filled with a "no number" tile. Since there is already a tile in the cell, it's impossible to put another one: the cell is really unavailable. And since the tile is a "no number", it does not impose any constraint on neighboring cells. It's nice that it fits in the general model, without needing a special case: it makes the algorithm simpler.

The current solution

My current solver is a simple "search and backtrack". I examine each cell in turn, look for all possible tiles that could fit (determining what is the current minimum and maximum value possible), try them all, one after the other, and do the same on the next cell, if the current position is still acceptable.
At first, I programed in Python, for the ease of iterative testing. It was enough for the first batch of levels, but it quickly became too long. I did some micro-optimizations first: changing some data structures, and some loops. It went a little faster. I tried using the excellent PyPy instead of the standard CPython: it cut the runtime by about two.
The next not-so-logical step was to port the algorithm to Java. The performance got somewhat better and I could solve a few more levels. After profiling, I realized that a lot of time was spent creating and deleting short-lived objects, and using HashMaps. So I converted most HashMaps to arrays and found ways to drastically reduce object management. Another good speed gain.

And then I did what I should have done from the start: focus on improving the algorithm itself! The main bottleneck was the function that tested if a working position was solved or not. I changed it to return a status telling if the position was solved, still open, or impossible because of some failed constraint. That small change made it possible to prune out some search branches really early and solve all but two of the forty levels in no time. The current Java program solves 38 levels in about 3.5 seconds on my machine. After backporting that change to the Python code and when run with PyPy, it takes 46 seconds to solve the same levels.

The current limitations

But still, level 38 and level 40 resist the brutish-force search. So I ported the code again, to C this time. Since the Java code had already been mostly 'de-objectized', it was quite an easy translation. But the result is disappointing. Compiled with full optimization, either with Cygwin-GCC or with Visual C++ 2010 Express, the runtime is about the same as the Java version. It's even a bit slower, actually: 4.7 seconds vs 3.5 for Java.

Another limitation, if I were to try to grab the 60 points, is that all my solvers need a text file as input, and currently, typing the data for a size-6 level already takes a good part of the available three minutes. And even if the solver was instantaneous, I would still have to read the result and move all the tiles manually to their destination. If I really want those 60 points, I will have to resort to my old game-solving tricks: read the screen, solve the problem, and automate the displacement of tiles by simulating mouse events.

Where you can help

All my code is freely available on github. If you find a flaw in my algorithm, a problem with my code or think of something different that might run faster, just tell me.

Comments available also on reddit.

1 comment:

  1. If you're interested in a mathematical approach, one idea is this: Your available cells basically give you a graph (in the "graph theory" sense), while the numbers on all the tiles give you a "degree sequence". A degree sequence (where each element in the sequence corresponds to a vertex in a graph and its value specifies the number of adjacent vertices in the graph) generates a finite number of graphs. What your task basically boils down to is finding the graph generated from the degree sequence that is also a subgraph to the one generated from the available "cells".
    This doesn't take into account the "locked" tiles (except to place further constraints on the matching graph), but it might be a good place to start. Do some reading on wikipedia and mathworld to see if there are any algorithms out there that might prove useful.