Using Computer Vision to Calculate a 1 rep max Part IV: The Resurrection

For all of my previous work on this subject, see:
- Using Computer Vision to Calculate a One Rep Max
- Using Computer Vision to Calculate a One Rep Max Part II: The Reckoning
- How to Install OpenCV on Heroku
- Using Computer Vision to Calculate a One Rep Max Part III: The Redemption
You can also see my generic solution to motion detection on a shaky video at Motion Detection on a Shaky Video with OpenCV + Python
To catch up any stragglers on the internets, I’ve been working on barbell detection in a bench press video in order to calculate a one rep max. It works well, but the problem is that input videos need to be fairly sterile to work, and a lot of assumptions are made in order to process the video in a reasonable amount of time (i.e. it’s assumed that the barbell passes through the center of the image, but that might not actually be true). Additionally, it takes about a full hour to process 20 seconds of video, which, if I’d like to make this processor available to all of the meat heads on the internets at scale, it should really be faster.
I had some success in isolating a moving object in a video in my last post, but now I want to apply that specifically to the one rep max calculator I’ve been working on. Here’s where I left off from my last post:
If I have an input video like this:
I can output a frame representing the aggregated motion in the image like this:
I can aggregate all of the motion across the entire video into a single frame and filter out most of the noise resulting from shakiness in the process. The goal is to find the width of the barbell using motion, and since the barbell is moving strictly vertically, we can try to detect its width by finding the clear collection of columns that represent that bar. If we collapse the columns into a single row, we get something like this:
As it stands, the row looks pretty noisy. If we can filter out the motion not associated with the barbell because of the rows they’re associated with, we should be able to get much better results. So from here, I wrote a heuristic to detect which row the barbell’s motion appeared most in. You could use a simple algorithm and find the row with the most motion, but greedy algorithms with such simplicity might be prone to clear outliers.
My solution was to find the row that had a lot of motion and one where the differential of the pixels in that row were pretty consistent. In this way, if you imagine a solid white line of pixels, the differential from column to column is 0, and this is what an ideal barbell presumably looks like. However, a solid black line representing no motion will have that same “good” value of consistently low differentials. So in order to maximize the motion and minimize the differential to try to find long lengths of white pixels, I used the formula:
(pixel_sum * pixel_sum / standard_deviation(derivative(row_array)))
The python code looks like this:
def _get_best_object_motion_row(self, motion_matrix): rows, cols = motion_matrix.shape[0: 2] row_scores = np.zeros(rows) for row_index in xrange(rows): row_array = motion_matrix[row_index, :] motion_score = np.sum(row_array) ** 2 differential_score = np.std(np.abs(np.diff(row_array.astype(int)))) row_scores[row_index] = motion_score / differential_score return row_scores.argmax()
And when we then collapse the best row and its immediate surrounding rows we get a cleaner looking image (it’s subtle, but compare this to the above image):
Now you can see the motion that likely corresponds to a barbell a little more clear. To explain it with pictures, if we have the above image as input, to find the barbell we want to do something to produce the below output:
This isn’t an immediately intuitive problem because data can easily change from input to input, and the same thresholds won’t apply across the board. In order to examine the problem more closely, I graphed the resultant row so I could find a better way to prune the data. When I graphed the values in the row, the results looked like this:
The graph illustrates the values in the row more clearly than a grayscale image. If you were to look at this, you could probably intuitive predict which set of points corresponds to the barbell, but this needs to be translated into computer instructions. After a bit of thinking, if I jump ahead a bit, I developed an algorithm that will estimate a barbell width and produce a width that can be graphed as follows:
The algorithm here was to iterate over all possible x offsets and all possible bar widths, find the minimum of those collections of points, and multiply it by the bar width. So we’re effectively trying to find a rectangle that fits inside the graph that maximizes its area without crossing any lines. The code for that algorithm looks like this:
def _find_width(self, motion_by_column): ''' Find the best combo of x_offset, width where width * max(min(combination)) is greatest ''' min_pixel_width = int(len(motion_by_column) * self.MIN_BAR_AS_PERCENT_OF_SCREEN) best_score = 0 best_width = min_pixel_width best_x_offset = 0 for x_offset in xrange(min_pixel_width): for bar_width in xrange(min_pixel_width, len(motion_by_column)): if x_offset + bar_width >= len(motion_by_column): continue y_values = motion_by_column[x_offset: x_offset + bar_width] min_val = np.min(y_values) score = min_val * bar_width if score > best_score: best_score = score best_x_offset = x_offset best_width = bar_width return best_x_offset, best_width
This worked really well, but when I ran the algorithm across some of my input videos, you can see that it runs into problems. It’s hard to see, but in the below video, the width of the barbell is too short. It stops before it gets to the right 45 lb plate:
If we examine all the data associated with the data, it’s a little more clear what’s happening:
In the above picture in the top left you can see the aggregated motion. In the bottom left you can see the motion detection collapsed to one column. And on the right side you can see how the barbell width was found. In this case, the input video is really clean with no shakiness, and the barbell should take up the width of the start and stop of the line in the graph.
The problem is that there was a small range of pixels in which very little motion was detected because of the colors of the pixels in the image. If I apply a gaussian blur to the image, we might be able to overcome that gap. So I applied a line smoothing algorithm. The code for that is here:
def smooth_list_gaussian(input_list, degree=8): window = degree * 2 - 1 weight = np.array([1.0] * window) weightGauss = [] for i in range(window): i = i - degree + 1 frac = i / float(window) gauss = 1 / (np.exp((4 * (frac)) ** 2)) weightGauss.append(gauss) weight = np.array(weightGauss) * weight smoothed = [0.0] * (len(input_list) - window) for i in range(len(smoothed)): smoothed[i] = sum(np.array(input_list[i: i + window]) * weight) / sum(weight) smoothed = [0 for _ in xrange(degree)] + smoothed return smoothed
Now if I apply the barbell detection to the smoothed line, this is what the results look like (note that it still fails):
In this case, the algorithm just didn’t work, but here I’d rather just go ahead and call the input bad rather than mess up the processing of all the other videos.
Final Algorithm
The above process is pretty much encapsulated in one method that should make the whole process fairly straightforward:
def find_barbell_width(self): probable_barbell_row = self._get_best_object_motion_row(self.union_frame) cropped_aggregate_motion = self._get_cropped_matrix(self.union_frame, probable_barbell_row) displayable_frame = self._make_frame_displayable(cropped_aggregate_motion) motion_by_column = self._collapse_motion_to_one_row(displayable_frame) smoothed_motion_by_column = smooth_list_gaussian(motion_by_column) x_offset, bar_width = self._find_width(smoothed_motion_by_column)
Results
The algorithm now is drastically faster. Rather than one hour of processing for 20 seconds, I’m down to about 30 seconds. The added simplicity also allowed me to take a chainsaw to my existing barbell detection algorithm and delete about 100 lines of complicated thresholds and otherwise poorly written functions.