The other day I was emailing with a JSON-LD for SEO customer about how Google crawls a new Shopify store and I realized that I've never wrote about crawl budgets here.
It's time to correct this now.
Google's crawler is a program that crawls the internet. Which is just a fancy way of saying it downloads webpages, keeps track of links to other webpages, and over time will crawl those links. They are also called spiders because they crawl the world wide web (early Internet geek humor at work here).
The amount of pages Google crawls on your domain in a given time frame is your crawl budget.
Let's say you have 50 products plus 3 pages for collections (categories), the homepage, and 10 pages about the store (about, shipping, etc). That's 64 pages.
Lets say Google gives you a crawl budget of 10/pages/day.
The first day Google would start on your homepage, see your the collection pages and your store pages since they are linked in the header and footer. It would start to crawl those pages but would run out of its budget before it got through all the pages (1 homepage + 3 collection pages + 6 of the business pages).
The second day Google might re-crawl the homepage and then pick up the rest of the business pages (1 + 4 remaining pages). At that time it didn't notice anything else on the homepage so it started to crawl your product pages, getting to 5 of them with the remaining budget.
On the third day it might start to crawl the remaining product pages, or re-crawl another page.
Etc, etc, and so on.
(Sidebar: The 10/day crawl budget is just an arbitrary number. On my website Google averages 399 pages per day but they've crawled as many of 3,012 pages in the last 90 days. The more popular a website is (via quality links), the more it's updated, and the quality of its content can all influence Google and their crawler.)
Eventually Google will have seen all of your pages and it'll just start re-checking pages to see if they've been updated or if there are new links.
The frequency of these re-checks is private but Shopify can influence Google somewhat with the
sitemap.xml file they create. This is the same file you can submit to Google inside of Google Search Console.
The crawl budget and it's limits are one of the reasons why I recommend launching a store early. If you wait to launch until you had all 1,000 products in the store, Google still might not get to them all for months. If you soft-launched even 10 days early, that's 10 more days of crawling Google can do before your official launch.
This example is over-simplified but once you understand the basics here you'll be able to make better decisions about your store's SEO, links, and how you publish content.
Today would be a good day to Install JSON-LD for SEO if you haven't yet.
You still have a couple of months before the holiday season. Just enough time to start getting Rich Snippets and beating your competition in the search results.