Lately I’ve been working on a small side project to try and calculate a 1 rep max on a benchpress using entirely a regular camera and computer vision. You can see my earlier blog post about my initial process for how to get the basic process working. Since then, I’ve been working to perfect the algorithm so the inputs don’t have to be sterile in order for the project to work. Additionally, my previous post accounted for basic barbell detection but had not yet calculated a one rep max.

# Anomaly Detection

When I first started this project, I’d asked one of my colleagues, Doctor Akshay “Aktavius” Shah (known primarily for his work with science), if anomaly detection was simply a matter of getting the standard deviation of a collection of data and discarding results about some coefficient of standard deviation. The answer, of course, was a resounding “No.” There is no mathematical definition for what constitutes an anomaly, so in the case of accurately detecting barbells in frames of a video, we have to constrain the problem as much as possible based on our use case.

# Known Constraints

As kind of a spoiler to this story, there were the constraints that ended up being taken advantage of in order to establish a size and location of a barbell in as many frames of a video as possible:

• The barbell will be moving
• The barbell will be an olympic barbell and will therefore have known dimensions
• The barbell will be the widest moving object in the frame
• The angle of the barbell in the image will be close to 0
• All of the pixels inside of a barbell detection will be relatively symmetrical
• Detected barbells in every single image will be relatively similar in appearance across every frame
• The range of motion of the barbell in the image will not exceed the range of motion of me squatting a barbell (about 40 inches)
• The video itself will be relatively stable

# Step 1: Find a probable barbell detection using motion detection

The initial method to get the ball rolling for finding a barbell across all frames of the video is to use motion detection. This is a simple method that doesn’t require much processing power based on a reasonable assumption that the barbell will be moving in the video. I actually just used brute force to find the best match between the outline of an olympic barbell with white pixels against the motion detection frame. Between all possible positions and all possible sizes, whichever combination had the LEAST impact on the image indicated the best match.

# Step 2: Filter detections by some reasonable assumptions and thresholds

After the initial barbell detection, you can see what typical results for position might look like below:

It’s quite clear to a human what the anamolies are, but it’s non netecessarily trivial to create hard and fast thresholds to discard data. While rules to detect anamolies might work for the specific case above, we need to create rules that apply to every single dataset.

In my case, I conservatively discarded some results based on our use case:

• Small barbell detections were discarded based on an assumption that among all of the detections, a barbell width will never be more than 10% smaller than other detected barbell widths
• Unreasonable ranges of motions were discarded. If a delta existed between points of greater than 40 inches, I could determine the outlier between the two based on average of all of the detected positions
• Any barbell detections outside of 3 standard deviations were discarded (see three sigma rule)

# Step 3: Run the detection algorithm again, but constrain the barbell size to a fixed width

After a first pass at barbell detection, it would be cheap to run the detection again with an assumed barbell width. We no longer have to brute force possible barbell sizes. Given that some results were already likely discarded, we can run the detection algorithm again with a fixed barbell size based on the mean of the previous results.

# Step 4: Do some crazy math and make up some formulas that turn out to work in order to discard bad detections

At this point, all of the barbell detections have been established based purely from motion detection, and the data is mostly good, but not nearly good enough to do some science. We still didn’t take advantage of the original RGB pixels from the input video.

From some of the earlier assumptions I mentioned, we can filter out bad barbell detections by checking the symmetry of the pixels within the detected locations and the likenees between that detection and every other detection we made. Since most of our detections are presumably good (say, at least better than 50%) and because a barbell is symmetrical, we can score our detections. So here, I took each detection and determined a score based on(symmetry * likeness * initial_impact), where a lower score is better. Initial impact represented the initial motion detection algorithm. For two different sample videos, you can see a graph of the score below:

In the graph, the higher numbers are bad, so we want to discard the “higher” numbers, but the question is: where should the cutoff be made? One additional note from the graph immediately above is that there are two clear outlier points at x0 and x1, so we want to discard those results as well. After a bit of trial and error, I came up with a formula that worked very well:

Take the standard deviation of the derivative of the points and divide by the square of the number of points in the graph. Do this for every possible combination of a starting x and a number of elements. The smallest score represents the best combination.

Ironically, I only graphed the data for this blog post, and it looks like I might have been able to find a simpler formula using second derivatives and inflection points, but whatever.

From here, we simply filter our data based on the above starting X and the number of points to use.

# Step 5: Decide on the final barbell size based on absolute best detections

I actually ran the barbell detection algorithm one final time, but in this case, I used a best possible barbell size based on the data collected up to now. Previously, I had just used a mean which included both good and bad data. However, based on the previous scoring algorithm we can now rank our detections. So in this case, I took the mean of the top 10% of the results and used that as the final barbell size.

Now, when we run the barbell detection algorithm, we also avoid trying to detect on frames where we got bad data (from the above filtering based on rank ordering of the detections). Now we finally have our resulting graph of position:

# Step 6: Do some science on the position graph

At this point, we want to apply some types of formulas to the data to extract a one rep max. Whether it’s using some type of correlation with average velocity or just using physics and Newton’s second law (I’m a big fan of the third law myself, but I digress), we need to identify which points belong to which rep.

We can do this by identifying local minima and maxima, which would correspond to the top and the bottom of the rep, but the problem is that this isn’t as straightforward as you might think. The data isn’t necessarily sterile, so minor fluctuations can still exist, so very small minima and maxima also exist which are actually just noise.

The solution I found was to sequentially identify the start and stop of a rep. Each start or stop will either be a maxima or minima, or it will be the very last or very end point on the chart. Starting from the first point, we find either the corresponding maxima or minima (depending on whether the current point is a minima or maxima), by trying to maximize the y delta and minimizing the actual distance travelled across points to get to that point. In this way, noise will likely be avoided since delta y hasn’t been maximized, and we won’t overreach to different repetitions entirely because the distance traversed will have greatly expanded.

You can see below a chart with the local maxima and minima identified as green and red dots, respectively:

# Step 7: Set a 1 rep max multiplier per rep using average velocity

I thought that once I had the data it would be pretty easy to establish a one rep max. My thinking was that a lifter would be applying a constant force to a barbell, and therefore:

barbell_mass * acceleration = applied_force = maximum_possible_barbell_mass * gravity

However, you can see just by looking at the above charts with position plotted that a constant acceleration is not applied. If that were the case, the chart of position would look like a quadratic equation, but it looks more like a linear equation after a huge and short burst of acceleration. My initial attempt at establishing a max were based on maximum acceleration applied during the lift, but these ended up creating enormously high values that were clearly not realistic.

This makes sense because certain parts of a lift, particularly benchpressing, are harder than other parts of the lift. Different muscle groups are used throughout the lift. More force can be generated right out of the eccentric phase of the repetition.

Luckily, I found that I was not the first to try to attempt calculating a one rep max using some types of calculations with position over time. I found numerous research papers on the subject, and typically these studies were done using all sorts of fancy pants sensors and equipment with good results. I could achieve those same good results without the fancy pants sensors.

I found this paper which linked average velocity during a repetition to an estimated one rep max. Their studies basically just ended up using a linear equation based on correlations from a large test group with great accuracy. So my calculations for determining a one rep max actually proved quite simple.

Given a velocity in meters per second, a one rep max score could be spit out. This was quite trivial. With the current data I could determine average velocity in pixels per frame, but this can be converted to meters per seonc because the length of the barbell is known in pixels and it is known in meters because of standard olympic barbell sizes. Frames can be converted to seconds based on the metadata of the input video and its framerate.

The End