Blog English Etudes

Google and the 130-Day Rule : Indexation strategy for already indexed pages

09/04/2025

Alexis Rylko

Consultant SEO & Directeur technique SEO chez iProspect France

Google and its 130 indexation treshold

Google’s ranking algorithm relies on a set of intrinsic values manually defined by its engineers.

In other words, these are fixed parameters that do not change dynamically and apply as absolute rules.

In this article, let’s investigate one of these values — a key element that sheds light on how the world’s leading search engine manages a fundamental aspect of its ranking system: indexing.

To be indexed, or not to be indexed, that is the question

For any website, being indexed by Google—and staying indexed—is a critical challenge.

If a page is not indexed, all other SEO efforts—content creation, link acquisition, conversion optimization, and more—become useless.

Yet, Google’s index is not limitless. Or rather, Google does not want it to be limitless.

In 2020, Google’s index contained 400 billion documents (pages). This figure was revealed during the cross-examination of Pandu Nayak, Google’s Vice President of Search, in the U.S. antitrust case against Google.

From Google’s perspective, as a functioning and profitability-conscious business, a larger number of indexed pages means more storage space, more computing power to analyze, classify, and monitor them.

This leads to increased operational costs—something every company, including Google, is looking to reduce today.

To control the growth of its index, Google’s search engine employs a wide array of techniques, including canonicalization (deduplication), predictive crawling, penalties, and more.

But what about pages that have been in the index for a long time? Probably, not all of them deserve to stay.

Google has a precise and well-defined mechanism to clean up its index.

Let’s investigate together!

Setting Up Our Playground

For our research, we will use Screaming Frog SEO Spider, whose paid version allows us to enrich our crawl data with information from the Google Search Console API.

Screaming Frog is a must-have SEO tool that every SEO consultant should have.

In the menu, select: Configuration > API Access > Google Search Console.
Log in to your account.
Go to the “URL Inspection” tab.
Check both boxes as shown in the image below.

“Inspection d’URL” is an API integrated into the Google Search Console API, allowing users to retrieve technical status information about pages as recognized by the search engine.

This tool is very useful for avoiding the need to inspect URLs one by one in the Search Console interface. The only limitation is a maximum of 2,000 pages per day per property, which can be bypassed by creating multiple properties.

It’s time to start your crawl.

Once the crawl has started, open the “Google Search Console” tab, where you’ll find plenty of useful information directly from Google’s index :

We find ourselves facing 20 columns of scattered technical indicators, and for now, nothing seems clear.

But as Henri Bergson said 90 years ago: Disorder is simply order that we are not looking for.

Let’s narrow our focus to four key columns:

Summary (whether the page is present in Google’s index or not)
Coverage (the reason why the page is not indexed, if applicable)
Last crawl (the date when Googlebot last visited the page)
Days Since Last Crawled

Here, we can see for each URL whether it is indexed by Google and how much time has passed since the last crawl.

Let’s sort the data by the “Days Since Last Crawled” column in ascending order.

And suddenly, our data organizes itself into a clear system of causes and effects.

Let’s investigate this with 5 real cases of different types, from different markets.

Case 1: Official Website of a Tire Manufacturer (Portugal)

This is one of the most well-known tire manufacturers in the Portuguese market.

After applying the analysis described above, we observe two possible states in the “Summary” column:

“URL is on Google”
“URL is not on Google”

But the most interesting insight comes from the “Days Since Last Crawled” column.

It appears there is a causal relationship between crawl frequency and a URL’s indexation status.

More specifically, URLs seem to be deindexed if Googlebot hasn’t crawled them for 130 days.

An Important Clarification

When configuring Screaming Frog, we made sure to check the option to send only indexable URLs to the URL Inspector.

In other words, the data we’re analyzing includes only technically valid pages—no noindex tags, no rel=canonical pointing elsewhere, and no pages blocked by robots.txt or other restrictions.

To avoid survivor bias, here are 4 more real examples.

Case 2: Sport News Website (France)

This is a completely different type of site, yet we observe the same pattern:

Pages that haven’t been crawled for 130 days are automatically removed from Google’s index.

They transition from the status “Submitted and indexed” to “Crawled – currently not indexed.”

Case 3: Fashion Magazine (Italy)

We observe exactly the same trend on this italian fashion magazine:

Pages that haven’t been crawled for 130 days gradually shift from “Submitted and indexed” to “Crawled – currently not indexed.”

Case 4: Corporate Site with a Forum (Worldwide)

Yet another type of website—a business site with an integrated forum for Q&A.

Once again, the same observation: the 130-day threshold applies.

Pages that haven’t been crawled for this duration tend to shift from “Submitted and indexed” to “Crawled – currently not indexed.”

Case 5: Governmental Website (France)

For the fifth and final example, a french institutional website—same pattern:

Pages that haven’t been crawled for 130 days transition from “Submitted and indexed” to “Crawled – currently not indexed.”

The 130-Day Rule

In all the observed examples, we consistently see the same trend:

The indexation status of our pages depends on the crawl frequency by Google.
It seems that Google applies a static crawl threshold of 130 days. Each page on the site has its own crawl frequency, which evolves over time. If this frequency drops to the point where Googlebot hasn’t crawled the page for 130 days, the page is removed from the index.
Therefore, it’s important to analyze your pages with a crawl window of 130 days or more to optimize and improve their value.

What to Do with Pages Not Crawled for 130 Days?

Now, the legitimate question: What do we do with this knowledge?

To answer this, it’s important to understand how the search engine allocates and distributes its crawling resources.

The crawl frequency is a dynamic value that the search engine constantly seeks to optimize in order to crawl the pages that are most worthy of it.

“If you want to increase how much we crawl, then you somehow have to convince search that your stuff is worth fetching, which is basically what the scheduler is listening to.”
Gary Illyes, Google Analyst.

Crawl Frequency Calculation

From a website standpoint, the crawl frequency is primarily determined by two groups of factors:

Content Quality of the Page
PageRank of the Page

Now, gather the pages that haven’t been crawled for 130 days and try to answer the following questions:

From a Qualitative Perspective:

What do these pages have in common?
Do they belong to a specific type?

For example:

On our tire manufacturer site (Case 1): Among the pages in question, we find category pages by brand that lack products or any differentiating content.
On the media site (Case 2): These are pages with very similar tags that can be optimized and enriched further.
On the fashion magazine site (Case 3): These are very short contents originally designed for social media distribution.

By improving the quality of these pages, you can enhance their crawlability and, in turn, their indexation.

“Scheduling is very dynamic. As soon as we get the signals back from search indexing that the quality of the content has increased across this many URLs, we would just start turning up demand.”
Gary Illyes, Google Analyst.

From the Perspective of PageRank

In addition to content, the frequency with which a page is crawled by Google is closely tied to its authority, which is formalized in the concept of PageRank.

The deeper a page is within the site structure, the less important it is considered.

And when this importance drops to a minimal threshold, to the point where Googlebot doesn’t deem it necessary to crawl the page more than once every 130 days, it eventually gets removed from the index.

This is essentially a clean-up process performed by Google, where pages deemed unimportant are removed from the index. This also explains why some pages that were indexed for a long time can suddenly be excluded.

Questions to Consider Regarding PageRank:

Where are the deindexed pages located in the site structure?
What is their depth level?
Do they receive enough internal and external links?

Two Final Tips:

If you want to know which pages are considered the most valuable by Google, crawl frequency is one of the most reliable indicators.
To conduct this study across your entire site, you can analyze logs. Ask your host or developer to export the logs for at least the past 130 days. Cross-reference them with your crawl data: pages that were crawled but don’t appear in the log files over the last 130 days are almost certainly not indexed.

Source : This is the English version of the article in my newsletter “SEO, Data & Growth” : https://newsletter.alekseo.com/p/12-google-and-la-regle-des-130-jours.

Article de :

Alexis Rylko

Consultant SEO depuis 2009 & Directeur technique SEO chez iProspect France, formateur SEO, conférencier, éditeur de sites & développeur d’outils SEO.

🏆 Jeune personnalité Search de l'année 2022 (SEMY Awards)

🏆 Prix de la "Meilleure campagne SEO 2023" (SEMY Awards).

16 réflexions au sujet de “Google and the 130-Day Rule : Indexation strategy for already indexed pages”

Salman

14/04/2025 à 14:51

thanks for sharing this finding, very helpful
Répondre
Ping : [SEOletter #53] NAVBOOST ujawniony, spam AI uderza w SEO, Discover na desktopie i zasada 130 dni -
Koray Tugberk GUBUR

16/04/2025 à 21:25

Nice research, and case study, thanks for sharing.
Répondre
- Alexis Rylko
  
  17/04/2025 à 15:49
  
  Hi Koray, thanks a lot for your comment! I appreciate It 🙂
  Répondre
Mirajuddin Gazi

17/04/2025 à 06:24

This is really helpful research and data backed insights. Thank you for sharing it.
Répondre
Srinivas

17/04/2025 à 06:29

Thanks for sharing
Répondre
Ping : Pagina scansionata, ma attualmente non indicizzata: soluzioni
ddandrn

21/04/2025 à 03:30

hello, i am from indonesia, one of the content creators in indonesia discussing your article. i came here directly to read it directly, your article really helps me, thank you for this very good information 🙂
Répondre
Ombi Lomri

21/04/2025 à 04:54

Excellent case study, looking forward to seeing your other case studies. thanks for sharing.
Répondre
Rafael

21/04/2025 à 12:58

Incredible work! It may seem like a small task, but it’s more important than it seems for some of us who work with domains belonging to “unintentional” clients and recover expired domains. This resolves a simple but very interesting question about URL deindexing times. Rest assured, this has helped me a lot to resolve and secure something I had been meaning to verify myself. Thanks!
Répondre
Thomas Pasturel

21/04/2025 à 15:37

I really enjoyed reading your work. This is real information gain. Simple yet new. It shows how much more quantitative research can be done for all sorts of Internet search. Congrats!
Répondre
José cardona

23/04/2025 à 14:41

Un gran trabajo respaldado en datos, casos, y análisis, muchas gracias.
Répondre
Alex Güemez

24/04/2025 à 03:07

Thank you for this amazing article.
I didn’t know about you.
I get mad all of the time the SEO GURUs say things without saying anything, this is not the case.
Keep going with this type of articles!
Répondre
MADHU KUMAR C

28/04/2025 à 11:54

Some of the aspects are new to know and too technical. But a need to focus on the crawl frequency matters.
Répondre
Adil Balti

10/06/2025 à 10:26

This is something really useful. Thank You
Répondre
Ping : Google Core Update June 2025: Don't Be Afraid, Publish Now - Season 5, Ep. 27

Laisser un commentaire Annuler la réponse