Thursday, March 31, 2011

Genetically engineered Qwop (part 1)

You might have heard about QWOP. I didn't. That's a comment on Reddit that made me discover "the hardest game ever." I believe the original home page for the game is at foddy.net. Go try to play it before reading on. You might end up very frustrated, or alternatively rolling on the floor laughing (while Qwop is in fact creeping on the ground, doing negative scores).

Wrong way...

After some time trying hopelessly to master the limbs of the athlete, I wondered what the best way to make a playing bot would be. That's a game without an obvious solution. There is not even a way to know what to do to run better. But at least, the inputs are quite simple. That's the perfect test case for some genetic programming!


# The idea

Ok, so what is needed for genetic programming? A form of DNA encoding, from which individual entities can be built and a fitness function telling us if the algorithm is going in the right direction. Generally, there is also a simulated environment where the entities "live."
In my case, the simulated environment is the Flash game itself. I will only drive it with synthesized key strokes. The advantage is that it's the real thing. But there are two main drawbacks. The first is that the simulation is real-time, which is quite slow and will have to run for long hours. The second is that while running, I cannot use the computer, because the simulator is moving the mouse and generating keystrokes. Hopefully, we have virtual machines to solve the second one. I used a VirtualBox Ubuntu GNU/Linux environment to drive the game. On a standard quad-core desktop machine, it appears to run at the same speed as the non-virtual game.

The fitness function is easy to find: it's the distance reached by Qwop before falling down, possibly combined with the time it took, to have a rough idea of speed. The interesting part is finding a DNA that generates a population vast and rich enough to host a possible winner. So what's the idea? I think that Qwop can run correctly with a repeating sequence of key strokes, at least once he reaches his cruising speed. So my DNA should produce a series of key events at some given time intervals, plus a repetition period. Let's try to formalize that.

I will write an uppercase letter for a key down, a lowercase letter for a key up, a plus sign for a unit pause. So for instance the string "Q+qW+wO+oP+p" means "Press Q for one unit of time, then simultaneously release Q and press W for one unit of time, release W and press O for one unit of time, and the same again for P." By the way, that sequence is not so bad. Qwop can reach 100 metres with it, but quite slowly, so, we'll try to find something better.


# Down to the code

Last time I chose to program in Python, because I love the language. But I used some Windows-specific libraries and some people were frustrated they could not see the program run on their machine. So this time I'm going to use Java. It's arguably less expressive than Python, but I know that the GUI and window interface is quite portable: my code for capturing the screen and sending synthesized key events should run unmodified on Window, Linux, Mac OSX and probably Solaris. I could even run Jython on top of the low-level calls. Or clojure for the lisp-lovers around here... For now I will start with pure Java. I will also include a functional, if rudimentary GUI to pilot the simulation.

The main class for interacting with the desktop is java.awt.Robot. It does everything I need: capture one pixel, or a rectangle on the screen and send mouse and key events. Let's start with screen capture. I will do as with Papa's Pizzeria: take some reference screenshots and search the screen for the reference patterns. I'll try to locate the game area by finding the light blue border around the message that appears on the opening screen. It has an homogeneous color (#9dbcd0), is four pixel wide, starts at 124,103 and goes to 505,306. But as you might have noticed, almost the same message box appears when a run is finished, except it's not exactly at the same position. It's at 129,99. So, in order to make it as easy as possible to run the program, and not force you to reload the game each time, I compute the real location of the game depending on which message box is shown. And how do I know? Easy! I try to find the two bright yellow ribbons that appear on the finishing screen. That's also the way I detect when the run has ended.


# Going the distance

The other thing I need is a way to read the current distance reached. That's the tricky part. I first thought I could apply some mathematical morphology to detect shapes, lines and characteristic points (end of lines, crossings...). That's certainly a valuable approach and I should work a little more on the theoretical part, but I finally decided to go for the easy route. I grab the part of the screen where the distance is displayed. I apply a very simple thresholding function to turn the image to black and white. Then I segment the text by finding fully empty pixel columns. And once each box is found, I match it with the reference images I have taken for all digits. The comparison sums the number of equals pixels and computes the percentage of identical pixels over the comparison area, so that small digits don't give overly low scores (like 1 or the decimal point). When the result of a comparison is over 90 % I consider it a match. Below that value, I drop the result. That way, I don't have to make a special case for letters, which don't match anything: they are naturally eliminated.

You can see from the screenshots that my segmentation algorithm is not perfect: the letters e and t are stuck in the same box, because there is no clear column between them. Hopefully, I don't want to read the text and it seems digits are always cleanly separated on screen. Would I need to read letters, I would probably try to segment using connex parts. That would probably work here because the threshold function has done a good job, but it's close...


# Hurdles

If you have played the game long enough, you have seen the hurdle at 50 meters. It's the only one in the game, but not in my project: I have faced several big issues, and I don't know yet if my approach will yield any good runner. The biggest trouble is that the real-time keystroke playing is not perfectly stable. The same string of key presses and pauses does not always give the same result in the game. I tried my best to send keystrokes at regular intervals, but even then, our poor athlete sometimes behaves differently. It forces me to run candidates several times and to consider statistically significant results only.

Another problem I found when trying to make the simulator run for several hours is that the game probably has memory leaks. After a few hours of uninterrupted play (something no human would ever do, I have to admit), the memory consumption of the Flash player is really slowing down the whole machine. I had to code a small workaround that simulates a F5 key press at regular intervals to reload the browser page and start with a fresh Flash player.


# Going genetic

After the first few tries, it's obvious that the task is going to be tough. Most of my random athletes fall on their head, or fold their limbs in very unnatural ways. So, the first step is to generate enough healthy candidates to initiate the genetic pool. Let's say that the first generation should contain only runners that can reach 5 meters in less that a minute and not crash at least 50% of the time. That's the condition I will set for them to be able to "reproduce". It might seem easy. I assure you it is not. I've had to run hundreds of random Qwops to get a handful of valid ones.

And that's where I am today, still running batches of random guys, hoping to generate some acceptable ones. There is no genetic programming yet. I still need to figure out what kind of mutation and reproduction I will do to try to produce better sprinters from the early cripples. I accept any good ideas you might have. I will need them, I think.

Ouch! That really hurts!


# If you want to try

The code is hosted at gihub https://github.com/slowfrog/qwopper. Running 'ant package' in the top-level directory should produce a jar file in the dist/ directory. Running the jar directly will open a small UI where you can launch some runs. The simulation can also be run from the command line like 'java -cp dist/qwopper.jar com.slowfrog.qwop.ui.Main 10 5'. That will run five different random athletes, each one for ten tries. Currently, that simulation automatically stops the runner after one minute, to avoid wasting too much time on the first batch.

8 comments:

  1. Yay! I remember suggesting that not too long ago. I think it's a good idea, so I'm glad I'm not the only one who had it!

    ReplyDelete
  2. Unfortunately, I get an error that it can't find the origin when I try to detect the play area. Bummer!

    ReplyDelete
  3. I see one problem with this approach. There are probably many different and wholly incompatible sequences that will get a decent run. I don't see any reason to believe that merging two of those together will gain you anything except chaos. If this were something that could run millions of iterations, then I'd say give it a try.

    BTW: I hope I am either misinformed or just wrong, but I don't think I am.

    ReplyDelete
  4. The problem you face is that the runner is unstable in the upright position. He's a sort of inverted pendulum. So you have to stabilise him. There are two ways: static and dynamic.

    Static means you put the runner into a stable position. In this case that's on one knee. So static stabilisation means knee-crawling.

    Dynamic stabilisation is quite possible, but requires feedback. You will have to track some of the body parts and use them to get variables that can feed into your genetic program in real-time. Doing this by image recognition is not impossible, but pretty hard: some body parts get occluded by others, so you have to use estimators that can be updated by direct measurement but carry on creating output without it. This is after you write a decent body part recognition routine. So I would turn to the other option of reverse engineering. You can look into the flash file while it's executing to find where the body part position information is stored, and grab it directly as input to your programs.

    ReplyDelete
  5. I think you should split the code into two genes, one for starting, and another for repeating. it is so important for a good runner to get a good start, which definitely has a separate set of timing compared to the repetition part of this thing.

    can you write a code in your formulation that can produce a decent runner by hand? that would be a good test, and would also give you a way to evaluate the range of physics calculations that could change while using the same genetic code.

    ReplyDelete
  6. One way you could get a good set of sample runners is to have people who currently are able to get at least a few steps down play the game and record their actions into your notation. Assuming that such people exist.

    ReplyDelete
  7. Changing to:
    int refColor = 0x87afc5;

    ...in matchesBlueBorder

    made it work somewhat better for me

    ReplyDelete
  8. /** Unit delay in milliseconds when playing a 'string' */
    private static final int DELAY = 150;
    This may well not be final. Maybe, the gene should look like something like this:
    "Q140qW160wO140oP160p" and maybe the delays should be randomized and evolved as well.

    ReplyDelete