Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Last time I messed around with RL, I solved the classic Mountain Car problem using Q-learning and Experience Replay (ER).

However, it was very basic in a lot of ways:

Last time, in case you missed it, I left off with a laundry list of things I wanted to expand on with Genetic Algorithms (GA). Let’s see which of those I can do this time!

This is pretty wordy and kind of dry, since I was just messing around and figuring stuff out, but I promise the next one will have some cool visuals.

Hey there!

Mountain Car (MC) is a classic Reinforcement Learning (RL) problem. It was briefly shown in a video I was watching, so I figured I’d give it a shot.

So last time, I solved the egg drop puzzle in a few ways. One of them was using a recent learn, Markov Decision Processes (MDP). It worked, which got me really stoked about them, because it was such a cool new method to me.

However, it’s kind of a baby process that’s mostly used as a basis to learn about more advanced techniques. In that solution to the problem, I defined the reward matrix and the transition probability matrix , and then used them explicitly to iteratively solve for the value function v and the policy p. This works, but isn’t very useful for the real world, because in practice you don’t *know* and , you just get to try stuff and learn the best strategy through experience. So the real challenge would be letting my program try a bunch of actual egg drops, and have it learn the value function and policy from them.

I was recently at the art store for some reason, just browsing. I found the linoleum stamp section at the back and immediately wanted to make some! We had made them in 5th grade art class or something, and I remember liking it a lot, but had never since then. They’re kind of the perfect type of art for me, since I seem to like 3D things with more of a “crafts” element. I like carving/whittling anyway, so this was perfect.

I grabbed a few (pretty cheap), and on the way home thought of what I’d do: make a square stamp with weaving paths, asymmetric, such that it could be stamped out in a grid to either create cool repeating patterns, or random ones.

My friend Mike recently showed me a puzzle game called Skyscrapers, which you can play here. It’s a neat idea, in the general theme of “fill in the numbers with these constraints” puzzles like Sudoku or Verbal Arithmetic.

The rules are like so. You’re given a board like this, representing a group of city blocks (one building per square), with numbers around the sides:

I first heard this puzzle when taking an algorithms class in undergrad. The prof presented it as a teaser for the type of thing you could solve using algorithmic thinking, though he never told us the answer, or what the way of thinking is. Then, it more recently came up with my friends while we were hiking, and we were talking about it. I’ll talk about what I have so far, but first let’s say what the puzzle actually is.

There’s a building with 100 floors. You have two identical crystal eggs. They will break if dropped from (or above) some height (the same height for both), and you’d like to find that height using the fewest number of drops possible. If you drop an egg from some height and it doesn’t break, you can use that egg again. Once an egg is broken (i.e., you dropped it from that breaking height or above), you can’t use that egg again. So the question is, what’s the best dropping strategy?

A while ago I watched this video by Numberphile (a very cool channel!). In terms of the actual math, it’s pretty light, but what I love about it is the idea of creating very cool images via very simple rules.

The idea of it is this. In the video, the guy shows a way of drawing a sequence of numbers (more on that in a minute). You draw a number line, and then you draw a semicircle *above* the line from the first number of the sequence to the second, with a diameter equal to the distance between the points. Then, you draw a semicircle from the second to third numbers of the sequence in the same way, except this time, the semicircle goes *under* the number line. You continue this way as long as you want, alternating above/below for the semicircles.

Round 3! Okay, this is where I try and polish it up in a couple ways.

Here are the things I said last time I needed to make better:

Genetic Algorithms are cool!

I was recently skimming through Russel and Norvig’s AI: A Modern Approach and came to the section on Genetic Algorithms (GA). Briefly, they’re a type of algorithm inspired by genetics and evolution, in which you have a problem you’d like to solve and some initial attempts at solutions to the problem, and you combine those solutions (and randomly alter them slightly) to hopefully produce better solutions. It’s cool for several reasons, but one really cool one is that they’re often used to “evolve” to an optimal solution in things like design of objects (see the antenna in the Wikipedia article). So, that’s kind of doing evolution on *objects* rather than living things. Just take a look at the applications they’re used for.