Okay Declan, let’s try making this post a short and sweet update, not a rambling Homerian epic about simple stuff.
I got a Raspberry Pi (RPi) and an RPi camera because I wanted to learn about them and mess around with them. If I could do image recognition with them, that’d be a good platform to do ML, NN, and if I got enough data, maybe even DS type stuff. Luckily, there’s a ton of resources and code out there already. I drew upon heavily from www.pyimagesearch.com, which is a REALLY useful site, explained very great for beginners. Two articles that I basically copied code from and then butchered were this and this.
He’s not quite doing “image recognition” in this code, it’s more like “difference recognition”. Very simply, he has a stream of frames coming in from the camera. He starts off by taking what will be considered a “background frame”. Then, for all subsequent frames, he subtracts the background from the current frame, and then looks at the absolute difference (all done in grayscale, to make it simpler) of pixels. If two frames were identical, you’d expect very little different. If an object appeared in the new frame, the difference would show that object. Then, he uses some opencv tools to figure out where the object is, and draw a box around it.
I was able to put his code together and run it pretty quickly (though I removed some stuff like uploading it to dropbox, instead doing a kind of naive thing of sending the files via scp to my other machine), producing this gif of local traffic outside my window:
Of course, the devil is in the details. If you watch it a few times, you’ll notice some weird behavior. Most obviously, boxes are detected around the objects, but then the boxes appear to remain where the object was for several frames. Here you can see it frame by frame:
Why does this happen? Well it’s actually a smart feature, but done in a somewhat clumsy way. In his code, he has the following (I combined the few relevant snippets) inside the main frame capturing loop:
if avg is None: print("[INFO] starting background model...") avg = gray.copy().astype("float") rawCapture.truncate(0) continue cv2.accumulateWeighted(gray, avg, alpha) frameDelta = cv2.absdiff(gray, cv2.convertScaleAbs(avg))
Here, the variable gray is the (grayscale) frame we’re capturing each time. The avg variable is the background I mentioned that we’ll be subtracting from all following frames. So in the if statement, it’s simply setting avg to be the first gray value if it hasn’t been set yet. The last line is simple, too: it’s the subtraction of avg from gray, each time. But the middle line is the key. The opencv function accumulateWeighted() lets you keep a running weighted average. The first argument (gray) is what you’re adding to this average, the second argument (avg) is the average you’ll be updating, and the last is a parameter that determines how much to weight the new addition to the average. This is actually a pretty smart feature, because if you wanted to run this all day, the lighting and other stuff would change, so eventually you’d be comparing how it looked at 6PM to how it looked at noon, and maybe even frames with no objects would get triggered. So this is an “adaptive” background, which he smartly did.
So can you see the problem? To illustrate it, here’s another example of three images, where I’ve also plotted the avg and frameDelta images for each:
(it looks kind of crappy because I just arranged a bunch of windows rather than making it produce a grid.)
Anyway, you can probably tell what’s happening. The middle column, for each example, shows avg after it’s been updated with the current gray frame. The right column shows frameDelta as a result. However, you can see that in the 2nd row, in avg, there’s still a “ghost” of the arm there. So when the arm is actually gone from gray, absdiff(gray,avg) will still have the arm.
So I fixed it with the following pseudocode:
for frame in cameraStream: frameOccupied = False gray = grayscaleAndOtherOps(frame) frameDelta = absdiff(gray,avg) frameOccupied = isObjectInFrame(frameDelta) #Do stuff with the object if there is one if not frameOccupied: cv2.accumulateWeighted(gray, avg, alpha)