Using Computer Vision to Calculate a 1 rep max Part IV: The Resurrection

For all of my previous work on this subject, see:

You can also see my generic solution to motion detection on a shaky video at Motion Detection on a Shaky Video with OpenCV + Python

To catch up any stragglers on the internets, I’ve been working on barbell detection in a bench press video in order to calculate a one rep max. It works well, but the problem is that input videos need to be fairly sterile to work, and a lot of assumptions are made in order to process the video in a reasonable amount of time (i.e. it’s assumed that the barbell passes through the center of the image, but that might not actually be true). Additionally, it takes about a full hour to process 20 seconds of video, which, if I’d like to make this processor available to all of the meat heads on the internets at scale, it should really be faster.

I had some success in isolating a moving object in a video in my last post, but now I want to apply that specifically to the one rep max calculator I’ve been working on. Here’s where I left off from my last post:

If I have an input video like this:

Screen Shot 2014-12-04 at 4.21.13 PM



I can output a frame representing the aggregated motion in the image like this:

Screen Shot 2014-12-04 at 8.54.05 AM

I can aggregate all of the motion across the entire video into a single frame and filter out most of the noise resulting from shakiness in the process. The goal is to find the width of the barbell using motion, and since the barbell is moving strictly vertically, we can try to detect its width by finding the clear collection of columns that represent that bar. If we collapse the columns into a single row, we get something like this:

Screen Shot 2014-12-04 at 4.20.29 PM

As it stands, the row looks pretty noisy. If we can filter out the motion not associated with the barbell because of the rows they’re associated with, we should be able to get much better results. So from here, I wrote a heuristic to detect which row the barbell’s motion appeared most in. You could use a simple algorithm and find the row with the most motion, but greedy algorithms with such simplicity might be prone to clear outliers.

My solution was to find the row that had a lot of motion and one where the differential of the pixels in that row were pretty consistent. In this way, if you imagine a solid white line of pixels, the differential from column to column is 0, and this is what an ideal barbell presumably looks like. However, a solid black line representing no motion will have that same “good” value of consistently low differentials. So in order to maximize the motion and minimize the differential to try to find long lengths of white pixels, I used the formula:

(pixel_sum * pixel_sum / standard_deviation(derivative(row_array)))

The python code looks like this:

    def _get_best_object_motion_row(self, motion_matrix):
        rows, cols = motion_matrix.shape[0: 2]
        row_scores = np.zeros(rows)
        for row_index in xrange(rows):
            row_array = motion_matrix[row_index, :]
            motion_score = np.sum(row_array) ** 2
            differential_score = np.std(np.abs(np.diff(row_array.astype(int))))
            row_scores[row_index] = motion_score / differential_score
        return row_scores.argmax()

And when we then collapse the best row and its immediate surrounding rows we get a cleaner looking image (it’s subtle, but compare this to the above image):

Screen Shot 2014-12-04 at 8.54.10 AM

Now you can see the motion that likely corresponds to a barbell a little more clear. To explain it with pictures, if we have the above image as input, to find the barbell we want to do something to produce the below output:


This isn’t an immediately intuitive problem because data can easily change from input to input, and the same thresholds won’t apply across the board. In order to examine the problem more closely, I graphed the resultant row so I could find a better way to prune the data. When I graphed the values in the row, the results looked like this:

Screen Shot 2014-12-04 at 3.41.41 PM

The graph illustrates the values in the row more clearly than a grayscale image. If you were to look at this, you could probably intuitive predict which set of points corresponds to the barbell, but this needs to be translated into computer instructions. After a bit of thinking, if I jump ahead a bit, I developed an algorithm that will estimate a barbell width and produce a width that can be graphed as follows:

Screen Shot 2014-12-04 at 8.54.17 AM

The algorithm here was to iterate over all possible x offsets and all possible bar widths, find the minimum of those collections of points, and multiply it by the bar width. So we’re effectively trying to find a rectangle that fits inside the graph that maximizes its area without crossing any lines. The code for that algorithm looks like this:

    def _find_width(self, motion_by_column):
        Find the best combo of x_offset, width
        where width * max(min(combination)) is greatest
        min_pixel_width = int(len(motion_by_column) * self.MIN_BAR_AS_PERCENT_OF_SCREEN)

        best_score = 0
        best_width = min_pixel_width
        best_x_offset = 0

        for x_offset in xrange(min_pixel_width):
            for bar_width in xrange(min_pixel_width, len(motion_by_column)):
                if x_offset + bar_width >= len(motion_by_column):
                y_values = motion_by_column[x_offset: x_offset + bar_width]
                min_val = np.min(y_values)
                score = min_val * bar_width
                if score > best_score:
                    best_score = score
                    best_x_offset = x_offset
                    best_width = bar_width
        return best_x_offset, best_width

This worked really well, but when I ran the algorithm across some of my input videos, you can see that it runs into problems. It’s hard to see, but in the below video, the width of the barbell is too short. It stops before it gets to the right 45 lb plate:

Screen Shot 2014-12-04 at 8.56.03 AM

If we examine all the data associated with the data, it’s a little more clear what’s happening:

Screen Shot 2014-12-04 at 8.55.44 AM

In the above picture in the top left you can see the aggregated motion. In the bottom left you can see the motion detection collapsed to one column. And on the right side you can see how the barbell width was found. In this case, the input video is really clean with no shakiness, and the barbell should take up the width of the start and stop of the line in the graph.

The problem is that there was a small range of pixels in which very little motion was detected because of the colors of the pixels in the image. If I apply a gaussian blur to the image, we might be able to overcome that gap. So I applied a line smoothing algorithm. The code for that is here:

def smooth_list_gaussian(input_list, degree=8):
    window = degree * 2 - 1
    weight = np.array([1.0] * window)
    weightGauss = []
    for i in range(window):
        i = i - degree + 1
        frac = i / float(window)
        gauss = 1 / (np.exp((4 * (frac)) ** 2))
    weight = np.array(weightGauss) * weight
    smoothed = [0.0] * (len(input_list) - window)
    for i in range(len(smoothed)):
        smoothed[i] = sum(np.array(input_list[i: i + window]) * weight) / sum(weight)
    smoothed = [0 for _ in xrange(degree)] + smoothed
    return smoothed

Now if I apply the barbell detection to the smoothed line, this is what the results look like (note that it still fails):

Screen Shot 2014-12-04 at 11.40.48 AM

In this case, the algorithm just didn’t work, but here I’d rather just go ahead and call the input bad rather than mess up the processing of all the other videos.

Final Algorithm

The above process is pretty much encapsulated in one method that should make the whole process fairly straightforward:

    def find_barbell_width(self):
        probable_barbell_row = self._get_best_object_motion_row(self.union_frame)
        cropped_aggregate_motion = self._get_cropped_matrix(self.union_frame, probable_barbell_row)
        displayable_frame = self._make_frame_displayable(cropped_aggregate_motion)
        motion_by_column = self._collapse_motion_to_one_row(displayable_frame)
        smoothed_motion_by_column = smooth_list_gaussian(motion_by_column)
        x_offset, bar_width = self._find_width(smoothed_motion_by_column)


The algorithm now is drastically faster. Rather than one hour of processing for 20 seconds, I’m down to about 30 seconds. The added simplicity also allowed me to take a chainsaw to my existing barbell detection algorithm and delete about 100 lines of complicated thresholds and otherwise poorly written functions.

The End