
Bots in Google are of 3 types crawling bots, indexing bots, and ranking bots. The algorithm for each bots are different and they are assigned specific tasks and are proficient in performing that task only. If you ask crawling bots to perform ranking tasks they might not perform tasks quickly and produce accurate results that are put by a collaborative efforts.
Let say time t1 is taken by crawling millions of websites, indexing bots takes t2 time to understand the context and arrange them category wise in data centre, and ranking bots put them into specific ranks on the basis of billions of parameters both for on page and off page presence. So in these crucial stages the total time is t1+t2+t3 when they are performing one by one. However, if they are performing all together then total time is tmax(t1,t2,t3)
Millions of websites are built and are requested for crawling each days. And hence t1 is increasing, t2 is also increasing due to increase in the number of crawled website, but most of it get rejected and thus the indexing time and ranking is less when compared to number of crawled sites.
Since non crawled website cannot be indexed or ranked hence we can conclude that the bots perform these tasks with collaborative efforts simultaneously for each website crawling, indexing and ranking.
Suppose there are n numbers of urls of websites that needs to be crawled and crawling bots takes t1 time, indexing bot takes t2 time and ranking bots take t3 time to rank website then there will be a wait of time tw for other website for some duration. And this can be calculated by
If a website cannot be indexed or ranked before it is crawled, then crawling, indexing, and ranking form a dependent pipeline.
Cumulative Waiting Time for a Continuous Stream of URLs
For a continuous stream of URLs, the cumulative waiting time can be expressed as an integral function. Since the integrand is constant, the cumulative waiting time becomes:
tw = n[(t1 + t2 + t3) − max(t1, t2, t3)]
More generally, if the crawling, indexing, and ranking times vary from one URL to another, denoted by t1(u), t2(u), and t3(u), respectively, then the total waiting time is:
tw = ∫0n [t1(u) + t2(u) + t3(u) − max{t1(u), t2(u), t3(u)}] du
where u denotes the URL number.
Accumulated Waiting Time in a Dependent Pipeline
If the i-th URL cannot begin processing until the previous URLs have passed through the pipeline, then the waiting time accumulated by all URLs is given by:
tw = ∫0n (n − u) max{t1(u), t2(u), t3(u)} du
This expression represents the sum of the delays imposed on the remaining URLs by the processing time of each URL.
Special Case for Constant Processing Times
For constant processing times t1, t2, and t3, the accumulated waiting time simplifies to:
tw = (n2/2) max(t1, t2, t3)
This integral formulation models the collaborative and simultaneous operation of crawling, indexing, and ranking bots while accounting for queueing delays among the URLs.
This integral model establishes that search engine bots operate as a dependent yet parallel pipeline, where crawling, indexing, and ranking are continuously coordinated. The equations quantify queueing delays, reveal bottlenecks, and explain why distributed parallel processing is essential for handling billions of URLs efficiently. Consequently, the formulation provides a mathematical foundation for studying the dynamics and scalability of modern search engine architectures.

0 Comments