Ever wondered why Google Search Appliance only returns 1,000 results page? Well, here is what ElasticSearch Definitive guide had to say about it being problematic:
"To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the requesting node, which then sorts all 50 results in order to select the overall top 10.
Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The requesting node then sorts through all 50,050 results and discards 50,040 of them!
You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query."
and word of advice, don't use GSA for your search engine if you want to do metrics, etc on entire dataset. Use the right tool for the right job! As someone searching the web via Google, do you think a person would go to 1,001 result on the page or would rather improve search query to get just a few records?
No comments:
Post a Comment