On Reddit, someone asked a question about their website having problems with crawling, wondering if too many 301 redirects to 410 error pages were causing Googlebot to use up their crawl budget. John Mueller from Google provided an explanation for why the person might be facing a slow crawl and clarified some details about crawl budgets in general.
Crawl Budget
People often say that Google has a “crawl budget,” which means there’s a limit to how much Google will explore certain websites. This idea was made up by SEO experts to explain why some sites don’t get crawled enough. The concept suggests that each website is given a specific number of crawls, like a limit on how much attention Google gives it.
To really understand this idea, it’s helpful to know where it came from. Google has always said there’s no single thing called a crawl budget, but the way Google explores a site might make it seem like there’s a limit on how much it looks at.
A top Google engineer, Matt Cutts, hinted at this in a 2010 interview.
Matt answered a question about a Google crawl budget by first explaining that there was no crawl budget in the way that SEOs conceive of it:
“The first thing is that there isn’t really such thing as an indexation cap. A lot of people were thinking that a domain would only get a certain number of pages indexed, and that’s not really the way that it works.
There is also not a hard limit on our crawl.”
In 2017, Google explained how their search engine works, gathering lots of information about crawling. This clarified what the SEO community meant by “crawl budget.” The new explanation is clearer than the broad term “crawl budget” that was used before.
Here are the key points about crawl budget in simpler terms:
a. Crawl rate is how many web pages Google can visit, depending on how fast the server can provide those pages.
b. If a server hosts many websites, it may have lots of pages. Google needs to crawl these pages based on the server’s ability to handle requests.
c. Duplicate pages or low-value pages can use up server resources, limiting the number of pages Google can crawl.
d. Pages that are not heavy or complicated are easier for Google to crawl.
e. Soft 404 pages can make Google focus on less important pages instead of the ones that really matter.
f. The way links are set up on a website can influence which pages Google decides to crawl.
Reddit Question About Crawl Rate
Someone on Reddit was curious if the unimportant pages they were making affected how often Google checks their website. Basically, when someone tries to access an insecure link for a page that doesn’t exist anymore, they get redirected to the secure version of the missing page, and it shows a message saying the page is gone for good.
This is what they asked:
“I’m trying to make Googlebot forget to crawl some very-old non-HTTPS URLs, that are still being crawled after 6 years. And I placed a 410 response, in the HTTPS side, in such very-old URLs.
So Googlebot is finding a 301 redirect (from HTTP to HTTPS), and then a 410.
http://example.com/old-url.php?id=xxxx -301-> https://example.com/old-url.php?id=xxxx (410 response)
Two questions. Is G**** happy with this 301+410?
I’m suffering ‘crawl budget’ issues, and I do not know if this two responses are exhausting Googlebot
Is the 410 effective? I mean, should I return the 410 directly, without a first 301?”
Google’s John Mueller answered:
G*?
301’s are fine, a 301/410 mix is fine.
Crawl budget is really just a problem for massive sites ( https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget ). If you’re seeing issues there, and your site isn’t actually massive, then probably Google just doesn’t see much value in crawling more. That’s not a technical issue.”
Reasons For Not Getting Crawled Enough
Mueller said that Google might not think it’s worthwhile to check more webpages. This suggests that these webpages might need a check to figure out why Google thinks they’re not important.
Some SEO methods make webpages with little value and no originality. For instance, a common SEO strategy involves looking at the best-ranked web pages to see why they’re highly ranked. People then use that information to make their own pages better by copying what works in search results.
That makes sense, but it’s not making something valuable. Imagine it as a choice between two options: Zero is what’s already in search results, and One is something original. If you copy what’s already there (Zero), you’ll just end up with a website that doesn’t add anything new.
There are technical problems that can slow down how quickly search engines like Google can explore your site, like server health and other factors. But when it comes to something called “crawl budget,” Google mainly talks about it in relation to big websites, not smaller or medium-sized ones.
For more such updates, tips and learning resources, stay tuned to Insitebuild Blog.