SEO Case Study for Developers: Ranking a Single Page Web Application

-EDIT-

This post is somewhat out of date at this point. Another good resource is from Houston SEO expert Josh Belland (see what I did there?)

There’s a reason they call me “The Peppermill.” It’s because I’m always grinding. And to continue with my last several posts, I’ve been working on a web application that predicts a one rep max using computer vision. You can see the other posts here:

Screen Shot 2014-11-26 at 8.47.53 PM

The final application can be found at OneRepMaxCalculator.com. The problem to tackle now is to actually rank on Google for the term “One Rep Max Calculator” and get on the first page of results. Here’s my ongoing work to rank higher and higher:

Problem 1: How to get Google to Crawl a Single Page Web Application

For a developer, this is likely something that you’d likely need to understand more than any of your counterparts. A modern web application that has any sort of sleek feel to it is likely using javascript heavily and is using fragment identifiers (# symbols) to navigate from page to page. But it’s not actually page to page. It’s actually one page. Because by the very nature of the internets, anything after the fragment identifier isn’t included in a request URL. Search Engine crawlers index individual web pages, and for whatever reason(s), Google will only index URL’s that uniquely correspond to a request. After following the Google Developer’s Guide for AJAX Crawling, I guess I messed it up because after my site had been indexed for the first time, there was only a handful of indexed keywords clearly from my one homepage route.

I blame the misunderstanding on the useless pictures on the developer page. Here’s what you need to do to have Google crawl your single page web application:

  • Change fragment identifiers from “#” to “#!”. So if you’re using Backbone.js like I was, all of your routes should look something like:
    • “!upload”: “uploadView”,
    • “!about”: “mainCopy”,
    • “!how”: “howCopy”,
    • “!tips”: “tipsCopy”
  • Add this to your base HTML:

    <meta name="fragment" content="!">
            
  • Create static pages that correspond to each view that are meant for only the crawler to be seen.

    • URL’s like “onerepmaxcalculator.com/#!about” map to “onerepmaxcalculator.com/?_escaped_fragment_=about”
    • You can most definitely automate this, but I just manually copied and pasted the page source from the rendered view and saved it to an HTML file. Then I had my server respond with that static file for the above URL for the crawler.
    • My code in python using Django looked like this:
    • GOOGLE_UGLY_URL = '_escaped_fragment_'
      
      
      def static_pages_for_crawlers(request):
          requested_page = request.GET[GOOGLE_UGLY_URL]
          requested_page_to_static_file = {
              'account': 'account.html',
              'about': 'about.html',
              'how': 'how.html',
              'tips': 'tips.html',
              'contact': 'contact.html'
          }
      
          static_file = requested_page_to_static_file[requested_page]
          return render_to_response("static_pages/%s" % static_file, {})
      
      
      def home(request):
          if GOOGLE_UGLY_URL in request.GET:
              try:
                  return static_pages_for_crawlers(request)
              except KeyError:
                  pass  # non-existent page
          # continue with normal request
      

One thing I also did that may or may not be helpful (but I’m inclined to think is helpful) is that I took advantage of knowing that the rendered page was solely for the search engines. One of the factors to determine page rank is page load time, so if we can strip out any unnecessary javascript or CSS, it will reduce page load time and help the page rank. It is said that you should try to avoid gaming the system at all costs, but in this case, I took my cues from Google. Check out this real single page web application from Google:

https://productforums.google.com/forum/#!forum/webmasters

And then its corresponding static URL for search indexing:

https://productforums.google.com/forum/?_escaped_fragment_=forum/webmasters

Here are screenshots from both pages:

Screen Shot 2014-11-17 at 9.08.24 AM
Screen Shot 2014-11-17 at 9.08.15 AM

After making the adjustments to all of my settings, you can see that Google is indexing my static pages as I’d been aiming for:
Screen Shot 2014-11-15 at 9.18.51 AM

Only last note is that I did include a sitemap.xml file that looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
    <url>
        <loc>
            http://onerepmaxcalculator.com/
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!contact
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!account
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!orientation
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!youtube/rKBq8mTLRpE
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!thankyou
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!summary
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!upload
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!about
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!how
        </loc>
    </url>
    <url>
        <loc>
            http://onerepmaxcalculator.com/#!tips
        </loc>
    </url>
</urlset>

Problem 2: The Low Hanging Fruit

Without going into too much detail since you can pretty much find the information easily from other resources, the basics of SEO are pretty much:

  • The domain name itself of a site can help rank for key words
  • Fast page load time helps
  • Make sure meta tags like description and title are all populated with information
  • Key words in the H1 tags help
  • For images, include keyword terms in the alt tags
  • Key words above the fold are more important than those further down (the content immediately viewable to a user are more important)
  • Links from other sites to yours are important, but the referring sites must be relevant to yours
  • The text from the page content itself is important

Here are some awesome tools that help to measure how your site is doing with respect to some of the above notes:

  • Feed the Bot (Find out if your site is missing meta tags, what your page load time score is, how keyword heavy your content is, etc)
  • Keyword Rank Checker (Get your current rank in Google and measure your progress as you make changes)
  • Backlink Checker (Find out which sites are linking to yours or what sites link to competitors. See how feasible it is to get links from the same place as competitors)
  • Google Analytics (Find out which terms are getting searched for that’s helping to create your organic traffic)

One more cool thing I found was http://kaffeine.herokuapp.com/ in the process of ranking. If your setup is similar to mine where you’re using Heroku to host your app, you’ll know that they make your site go idle if there’s inactivity for an hour. In a low traffic situation, we run into a chicken and egg problem where there’s no traffic because we can’t rank on Google, but we can’t rank on Google when the site takes up to 20 seconds to start up. Kaffeine will ping your application every 10 minutes to keep it alive, and the result is that Google will never catch your site at a bad moment while it’s asleep (and potential users as well).

Problem 3: Page Content

This might be another problem that can end up plaguing developers more than normal builders of the internets. The thing you’re building may consist entirely of server-side logic. In my case, I just needed the user to upload a video, and that really was it. Anything else was clutter, but as far as Google can tell, I might as well be a page hosted on GeoCities from 1995 with how minimalistic my site was. As of my site first being indexed on Google, here’s the site that was beating me out by one place:

Screen Shot 2014-11-17 at 9.21.34 AM

I mean come on!

This is why the conventional wisdom for SEO involves writing a blog. This is a way to make changes to your site to indicate that content is fresh (which is also important to Google), you can continue to rank for the keywords you care about, but perhaps most importantly, you’ll start to rank for a bunch of phrases that you never thought of that are probably relevant to your site. For example, in this blog, as of the time of writing this, about 2-% of the referrals coming to my site are from searches for “facial recognition in python”

Anyway, to correct the problem of ranking poorly, I had to write some kind of content to help Google understand what my site is about. So I spent about an hour or so just writing basic “about” pages. In this case, why knowing your one rep max is valuable, how the algorithm works, and tips for uploading clean inputs. In all reality I doubt many people will read the content, but the point is that the content is primarily for search engines. At worst, the content will help to provide legitimacy to the site itself to incoming users that are skeptical, especially for a paid application.

Problem 4: Backlinks

Getting other pages to link to you is fairly difficult, and getting quality sites to link to you at scale is probably impossible (hence why this is the foundation of Google’s algorithm). In truth, page content is more important than back links. As an example, here are two screen shots for the term “workout generator” for the 1 and 2 slots on Google:
Screen Shot 2014-11-17 at 5.12.31 PM

Screen Shot 2014-11-17 at 5.12.49 PM

Don’t actually visit WorkoutGenerator.net though. I wrote it a long time ago and the front end sucks. As of this writing, WorkoutGenerator.net is #2 on Google. It used to be #1, but then I created an outbound link to OneRepMaxCalculator.com and it dropped in rank. So on that note, be mindful that outbound links can transfer some of your page rank or link juice or whatever you want to call it to other sites.

The point is that you shouldn’t worry about backlinks until a little later when you have the ball rolling a bit. Getting links can be valuable, but it also might be an upfill battle in the beginning.

For this site, ScottLobdell.me, here are some of the backlinks that I’ve gotten over time naturally with no deliberate effort on my part:

  • From my company’s engineering blog, Hearsay Social Engineering
  • A random StackExchange post linking to my post about Kalman Filtering
  • Python-Soco, in reference to my post about using Sonos to play theme music across the office. This resulted from some brief interaction on Twitter
  • Meetup.com, after posting my slides from a presentation onto the discussion portion
  • Some random post on linear data smoothing, where I left a “thank you” comment

The Results

As it stands, I went from 73 in Google initially to 64, but if I’m not lazy I’ll come back to update this post. There are still a few more things I’d like to try (posting on Reddit, generating traffic through YouTube, posting appropriately on forums, Facebook ad targeting, etc).

Before any changes:

Screen Shot 2014-11-15 at 9.20.55 AM

Added lots of new copy:

Screen Shot 2014-11-20 at 10.26.40 AM

Removed a link to another site that I thought was relevant (but actually hindered pagerank a bit):

Screen Shot 2014-11-23 at 11.53.59 AM

  • troll

    What a shit site lol

  • Scott, this is pretty brilliant. I’m going to link to this article as a technical SEO reference guide. Very nice work. I can’t believe I’m just now noticing.