Optimizing Elasticsearch Tagging: How to Boost Tagging Search Efficiency by 300 %
In this article, we'll share three core tips for optimizing Elasticsearch, including how to design indices, how to optimize queries, and how to configure hardware.By adjusting field types, using filter caching, and adopting a sharding strategy, developers can improve tag search performance by 300 %, solving the problem of bottlenecks in tagging efficiency when handling massive volumes of data.
Why does my tag search always stall?
Many developers have experienced this problem: They clearly label data, but when they search for it, they have to wait several seconds.When the data volume exceeds a million records, it is impossible to get rid of the problems of page turning and timeouts.The problem is often in the design of the index and the way queries are written. For example, directly storing tags in a text type, or using a wildcard for fuzzy matching, will cause Elasticsearch to use up a lot of resources in the background.
Three practical optimization plans.
Reducing the burden of the label field.
Don't use the default "text" type for tags! Try changing it to "keyword" to skip the tokenizing process.If you want to do partial matching (for example, searching for "cellphone*"), you can set up both a keyword field and a text field with edge_ngrams.In tests with datasets of over 10 million records, this design was able to reduce query times from three seconds to 800 milliseconds.
Let the filter bear the pressure.
When you need to search multiple tags, use the "filter" method instead of the "query" method.For example, when searching for "unpaid + Beijing + VIP user," ES's filter can use a caching mechanism to speed up subsequent identical searches by five to eight times.Remember to put the high-frequency selection criteria at the front of the "must" clause. This way the query plan will run more efficiently.
The more pieces, the better?
I've seen people set the number of fragments to 100 +, and then they actually get slower response times.The number of shards depends on the amount of data and the hardware: it's best to keep the size of a single shard between 10 and 50 GB.If the main purpose is to query tags, I would suggest setting up routing to send all data with a particular tag to a fixed shard, so that searches don't have to scan all the shards.After one e-commerce platform used this technique, its QPS during peak periods doubled.
Don't step into these traps.
Don't play games with the "script_score" field--it's just too expensive.If you must use a script, remember to open up the "_ source "switch to disable it, and only load the fields you need.Another arrangement is to separate hot data from cold data. Historical data from three months ago is moved to a hard disk node, while SSD nodes focus on serving hot data queries. This arrangement can save 40 % in hardware costs without affecting performance.
How will the effectiveness of the program be verified?
After optimizing your queries, don't be in such a hurry to pack up and go home. Open Kibana's monitoring panel and pay close attention to two metrics: query_time and fetch_time.For performance testing, we can intentionally create extreme scenarios, such as simultaneously querying 20 different combinations of tags and doing deep pagination.One clever approach is to compare the logs of the profile before and after optimization.