Motion Detection on a Shaky Video with OpenCV + Python

Too long; did not read

Run the code in my github repository here.

True Story Follows

I’ve been grinding on my one rep max calculator in association with, but I’ve been finding that the input needs to be more sterile than I’d like in order for the algorithm to work. Additionally, the algorithm runs pretty slow (maybe about an hour to process 20 seconds of video).

I learned previously that the way to optimize such problems was to take advantage of the known constraints of your use case as much as possible. You can check out my other blog post about the brute force method in which I was detecting barbells, but I recently revisited the problem to see if I could find more known constraints to take advantage of. What I realized I hadn’t taken advantage of was to examine the entire video at once rather than a frame at a time. In the case of a barbell moving up and down in a fairly identical pattern, you could imagine that if you have the video converted to motion detection, you could create a union of all the frames into one, and ideally you would have one large rectangle in the center of the screen representing the range of motion of the barbell.

Unfortunately, this didn’t work. The reason is that too much noise existed in the video because of a shaky camera, so you end up creating a union of pixels filling up most of the screen.

However, it dawned on me that the motion of the barbell should be different from the rest of the motion. In the case of the latter, the motion detection resulted from the camera moving back and forth while the former was the result of an object actually moving across the screen with a given velocity.

My hypothesis, then, was that we could examine a set of frames through time and have motion cancel each other out. The results were pretty solid.

Sample Videos

Before delving into any further explanation, you can see the progression of the algorithm below:

Original Video

Basic Motion Detection

Motion Detection with shakiness removed

Motion Detection with artificial shakiness to cancel out more motion

The Algorithm

You can examine the code itself on my Github repo, but here’s the idea:

For basic motion detection, take 3 frames. Get 2 resultant differential frames. Exclusive OR the two resultant frames. This will output colored pixels representing motion detection. This was the foundation for the barbell detection algorithm that followed.

def _get_motion_detection_frame(self, previous_previous_frame, previous_frame, frame):
    d1 = cv2.absdiff(frame, previous_frame)
    d2 = cv2.absdiff(previous_frame, previous_previous_frame)
    motion_detection_frame = cv2.bitwise_xor(d1, d2)
    return motion_detection_frame

For all the complexity that I later added, it was well worth making the motion detection a bit more intelligent. The idea is that as we traverse through motion detection frames, we can examine the past second (or whatever time length – this should vary based on what you’re trying to detect and what sort of speed you expect it to have), and from there we can subtract all of the previous motion from the current frame’s motion. In this way, shakiness will get cancelled out, but an object that’s moving across the screen will be unaffected.

def _remove_shakiness(self, frames):
    clean_frames = []
    min_previous_frame_count = 6
    max_previous_frame_count = 20
    for index, frame in enumerate(frames):
        previous_frames = frames[:index - min_previous_frame_count]
        previous_frames = previous_frames[index - max_previous_frame_count:]
        missing_frame_count = (max_previous_frame_count - min_previous_frame_count) - len(previous_frames)
        if missing_frame_count > 0:
            previous_frames = previous_frames + frames[-missing_frame_count:]
        cumulative_motion = self._get_cumulative_previous_motion(previous_frames)
        final_frame = frame.astype(int) - cumulative_motion.astype(int)
        final_frame[final_frame < 0] = 0
    return clean_frames

def _get_cumulative_previous_motion(self, array_list):
    resultant_array = np.zeros(array_list[0].shape)
    for array in array_list:
        resultant_array = np.maximum(resultant_array, array)
    return resultant_array

This worked extremely well and left me with only a little bit of remaining noise. So I took it one step further from there and took the noise from the previous frames and just arbitrarily shook it a little bit more and spreading the noise left and right, up and down.

def _get_cumulative_previous_motion(self, array_list):
    resultant_array = np.zeros(array_list[0].shape)
    for array in array_list:
        resultant_array = np.maximum(resultant_array, array)

    for y_offset in xrange(-self.Y_PIXEL_RANGE, self.Y_PIXEL_RANGE + 1):
        for x_offset in xrange(-self.X_PIXEL_RANGE, self.X_PIXEL_RANGE + 1):
            offset_array = np.roll(resultant_array, x_offset, axis=1)
            offset_array = np.roll(offset_array, y_offset, axis=0)
            resultant_array = np.maximum(resultant_array, offset_array)
    return resultant_array


If you check out the screenshot below you might be able to see where I’m going with all this:
Screen Shot 2014-12-01 at 9.07.26 AM

Here we can see the aggregation of motion by column, the union of all motion across all frames, and a frame from the original video. From here I should be able to find some efficient methods for finding the barbell’s width.

The End

From here I can re-approach the problem of getting a bounded box of motion, but the above solution might be generic enough to help solve some other problems.

  • Jan de Lange

    What if you would first try to stabilize the footage, based on rigid, non-moving objects, and then try image detection? You have assumed camera movement is a given, and try to deal with it, but what if you could reduce camera movement beforehand?

    • Scott Benedict Lobdell

      That would be ideal for sure if you were able to pinpoint one distinct object for each frame

      • Jan de Lange

        I was thinking along this line: just basic video image stabilization. I tried to stabilize your vid to see what it does, but I ran into an issue (see post). so I cannot show you. If I look at his examples I would say your vid should improve too. To your answer: the algorithm traces the objects, no need to pinpoint.