SEO managers understand how essential rankings are to a business. Google giveth and Google taketh away.
Knowing how Google is crawling a site is a critical step when optimizing a site. That’s why SEO managers need to understand how Google crawls, evaluates, and ranks their website. It is key to building a solid SEO strategy. There are plenty of levers to pull in the SEO world to understand how Google evaluates and ranks your site but perhaps the most important is a log file analysis. Before I get into log file analysis, let me take a minute to explain how Googlebot crawls your site.
- Google needs to consume and catalog the entire internet regularly.
- Googlebot is the web crawler built and used by Google to find and evaluate the content on the web.
- From these Googlebot crawls, Google will evaluate the relevancy of a site for various search terms and serve it accordingly based on searches made by users.
- Because there are so many webpages for Google to crawl, a domain will only receive a certain amount of crawl budget per day.
Your website’s relevance and keyword rankings are determined by these crawls. Therefore it's critical for SEOs to make the best of the limited crawl budget Google will allocate to your site.
This is where the log file comes in handy.
What is a log file?
A log file is a file (stored on your web hosting server) that documents events occurring on an operating system (which in this case would be your hosted domain). There are different types of log files (error log files, access log files, etc.) but when we run a log file analysis we’re specifically looking at the access log file.
All sites should have access logs set up with their web hosting by default but you will need to connect with your hosting service provider to verify if you want to be sure.
What is a log file analysis?
At 97th Floor, a log file analysis is the investigation of existing which should provide the insights needed to 1) understand Googlebot’s priorities and behavior while crawling your site, 2) identify any issues Google has crawling the site, and 3) provide an action plan to resolve these issues.
The log file analysis will be broken down into 3 aspects (listed below):
- Data gathering
- Recommendations + implementation gameplan
I’ll walk through each element to show how each phase feeds into the next.
Gathering the data
Before you begin the log file analysis, you need to be sure you’re starting with the correct data. We’ll be using Screaming Frog Log File Analyzer in this example, which is what we use in practice.
Here’s what you’ll need to run a log file analysis:
- 1-3 months of access logs from the domain being analyzed: 1-3 months worth of log file data will give you an idea for Google’s most recent/relevant crawl behavior for your site. If you are using Screaming Frog Log File Analyzer to run the analysis, you’ll need the access log file to be in the following formats:
- Apache and NGINX
- Amazon Elastic Load Balancing
- HA Proxy
- Screaming Frog crawl data: This data will be overlaid with the log file crawl data in order to match up things like rel=”canonicals,” meta robot tags, and other URL specific data that will help tell a complete story on how Google is crawling your site, thus leading to more informed recommendations.
- Google Analytics data: This will also be overlaid with the log file crawl data as a way to see how the conversion heavy pages are being crawled by Google. It will also contain session data that will help us understand the implications of Google’s crawls on your site.
Once you have this you’ll be able to move onto the actual data analysis.
The analysis itself
To analyze all this data I use the following toolset:
- Screaming Frog Log File Analyzer: This is the core tool we use in the log file analysis. Here’s a great intro guide on what this tool is and how to use it.
- Screaming Frog SEO Spider: This is what we’ll use to extract the URL specific data as it relates to the site being crawled.
- Google Sheets or Excel: This is where we’ll be doing our data manipulation.
As we execute the log file analysis, here are the questions we’ll need to answer as we run through the data:
- Are there any subfolders being over/under crawled by Googlebot?
- Where to look in the Screaming Frog Log File Analyzer: Directories, with special attention given to the crawl numbers from Googlebot.
- Are our focus pages absent from Google’s crawls?
- Where to look in the Screaming Frog Log File Analyzer: URLs. If you have Screaming Frog SEO Spider data coupled with the log file data you can filter it down the HTML data with the view set to ‘Log File’. From here you’d be able to search for your focus pages you want Google to care about and get a feel for how they are being crawled.
- Are there slow subfolders being crawled?
- Where to look in the Screaming Frog Log File Analyzer: Directories. You’ll need to sort by Average Bytes AND Googlebot AND Googlebot Smartphone (descending) so that you can see which subfolders are the slowest.
- Are any non-mobile friendly subfolder being crawled by Google?
- Where to look in the Screaming Frog Log File Analyzer: Directories. You’ll need to sort by Googlebot Smartphone in order to see which pages aren’t getting crawled by Googlebot Smartphone which could be an indication of a mobile friendliness issue needing to be addressed.
- Is Google crawling redundant subfolders?
- Where to look in the Screaming Frog Log File Analyzer: Directories. As you examine the subfolders listed therein, you should be able to see which directories are redundant and require a solution to effectively deal with them.
- Are any 4XX/302 pages being crawled by Googlebot?
- Where to look in the Screaming Frog Log File Analyzer: URLs. Once you identify the broken pages Google is hitting you’ll know which ones require higher priority to 301 redirect.
- Is Google crawling any pages marked with the meta robots no-index tag?
- Where to look in the Screaming Frog Log File Analyzer: URLs. You’ll need to sort by ‘Indexability’,then by ‘Googlebot’ and ‘Googlebot Smartphone’ to get a feel for which pages are marked as no-index but are still getting crawled from Google.
- Are the rel canonicals correct for heavily crawled pages?
- Where to look in the Screaming Frog Log File Analyzer: URLs. This is where you can see if the rel canonicals on the pages getting crawled to most have the correct rel canonical URLs.
- What updates to the robots.txt file/sitemap.xml are needed in order to ensure our crawl budget is being used efficiently?
- Based on what you find in your analysis, you’ll be able to identify what subfolders/URLs you’ll need to disallow (robots.txt) or remove/include in the sitemap in order to send the clearest signals to Google regarding which pages you would like for it to crawl.
The answers to these questions will give us an idea of how well Google is able to crawl your site as well as here we can improve the overall configuration of the site in order to make better use of Google’s limited crawl budget to your site.
Recommendations + implementation gameplan
As you answer these questions, you’ll gain valuable insights on what may be holding back your website’s performance and how you can improve it.. From these insights, you’ll want to build out a list of recommendations to improve the crawlability of the site as well as a game plan for implementing this going forward.
Some of these recommendations will include:
- Configuring and improving how Google crawls of the site
- Using the robots.txt to disallow sections of the site we’re seeing Google crawl that it doesn’t need to crawl.
- ID additional technicals SEO fixes for the site
- Updating meta robot tags to better reflect where you would like Google to focus its crawl budget.
- Broken pages
- Building 301 redirects for 404 pages that Google bot is consistently hitting.
- Duplicate content
- Building a content consolidation game plan for redundant pages that Google is splitting its crawl budget on.
- This game plan would involve mapping out which duplicate/redundant pages (and even subfolders) should either be redirected or have their content folded into the main pages being leveraged in the site’s keyword targeting strategy.
Once these recommendations have been built, you’ll need to work with your web development team to prioritize them. I recommend rating them on a scale of 1-5 on these three categories.
- Difficulty to implement
- Turn around time
- Potential for SEO yield
Once the order has been established, you’ll work with your web development team to implement these fixes in a manner that works best for their development cycles.
Ready for some results?
Sounds like a lot of work, it sure seems like it. Many wonder, “Does this really work?” To answer that, here’s a brief case study that demonstrates the impact a log file analysis can have on an SEO strategy.
During a recent client engagement, we were working to increase e-commerce transactions brought in from Google organic traffic.
We begun work like we usually do, technical audits. As we examined Google Search Console, we noticed that there were some indexation irregularities. Specifically some pages missing in Google’s indexation and overall coverage of the site. This is a common symptom of a crawability issue.
In order to improve Google’s ability to recognize these fixes and reward the site accordingly, we ran a log file analysis to identify ways we could improve how Google crawls the site. Some of these findings included:
- A number of redundant subfolders being crawled by Google.
- Broken pages missed in our initial site audit that need to be redirected.
- Various subfolders that Google was spending time crawling that didn’t actually play a role in our SEO keyword ranking strategy.
We created an action plan based on these findings, and worked with a web development team to ensure they were addressed.
Once the log file findings were implemented, we saw the following results (comparing 30 days with the previous 30 days):
- e-commerce transactions increased by 25%
- e-commerce conversion rate increased by 19%
- Increase in Google organic e-commerce revenue by 25%
As with all SEO strategies, it’s important to make sure Google is able to acknowledge the changes you’re making so that it is able to reward your site accordingly. Running a log file analysis is one of the best ways you can make sure this happens, regardless of the technical SEO fixes you are implementing.