logo Subscribe to: rss Email Feed:


AJAX Crawling Live?

Friday, January 29th, 2010

Vanessa Fox at Search Engine Land has a fabulous post on Google’s proposed crawling of AJAX (click the link at the end of this post to read her definitive step by step outline of how to make your own pages more crawlable) but here we are just going to recap the Google journey to AJAX crawling:

October, 2009: Google announces at the SMX East conference they are attempting to develop a way to crawl AJAX:

Today we’re excited to propose a new standard for making AJAX-based websites crawlable. This will benefit webmasters and users by making content from rich and interactive AJAX-based websites universally accessible through search results on any search engine that chooses to take part. We believe that making this content available for crawling and indexing could significantly improve the web.

While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines.

Some of the goals that we wanted to achieve with this proposal were:

  • Minimal changes are required as the website grows
  • Users and search engines see the same content (no cloaking)
  • Search engines can send users directly to the AJAX URL (not to a static copy)
  • Site owners have a way of verifying that their AJAX website is rendered correctly and thus that the crawler has access to all the content

Google had previously warned against excessive use of AJAX:

Many webmasters have discovered the advantages of using AJAX to improve the user experience on their sites, creating dynamic pages that act as powerful web applications. But like Flash, AJAX can make a site difficult for search engines to index if the technology is not implemented carefully. There are two main search engine issues around AJAX: Making sure that search engine bots can see your content, and making sure they can see and follow your navigation.

While Googlebot is great at understanding the structure of HTML links, it can have difficulty finding its way around sites which use JavaScript for navigation. We’re working on doing a better job of understanding JavaScript, but your best bet for creating a site that’s crawlable by Google and other search engines is to provide HTML links to your content.

Google provides tips on how to optimize AJAX, but Vanessa’s post is better. Here’s a snippet – go here to get the whole breakdown:

(Search Engine Land:) This implementation basically requires that you:

  • Modify your AJAX implementation so that URLs that contain hash marks (#) are also available via the hash mark/exclamation point (#!) combination (or, as I recommend below, that you replace the # versions entirely with the #! ones).
  • Configure a headless browser on your web server that processes the ?_escaped_fragment_= versions of the URLs, executes the JavaScript on the page and returns a static page.

Confused yet? So is everyone else, but Vanessa does have more info for the code monkeys among us.  She warns that everything is not working as smoothly as Google would like yet, so full scale modifications and AJAX built web sites are not quite the way to go just yet.

Tags: , , ,

Leave a Reply