说明
自建analyzer
autosuggest_analyzer
n
```使用带词边界的n-gram算法将text分解为下面token后的term 进一步进行“临接拆分“, 进一步分解为最小长度为1,最大长度为20的term
例如:
文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
一般情况下,text类型 默认使用standard analyzer 分析成下面token:
the 2 quick brown foxes jumped over the lazy dog's bone
autosuggest_filter 会分解为
t th the 2 q qu qui quic quick ...
结论:这个自建的auto_suggest,虽然可以满足suggester功能,但是返回结果排序不理想 autocompletion功能的话还是建议使用completion类型
##### ngram_analyzer
使用不带词边界的n-gram算法将text 分解后的token进一步进行“临接拆分“, 进一步分解为最小长度为2,最大长度为9的token。 文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."分析后结果: th the he qu qui quic qucik ui uic uick ic ick ck
结论:es自己内建了一个ngram tokenizer,看效果和自建的ngram_analyzer效果差不多,应该可以直接用内建的ngram tokenize
##### english_analyzer
使用Porter分词法,对text字段分词后的token 进行term(分词处理)
文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."分析后结果: the quick brown fox jump over the lazi dog bone 其中 foxes,jumped,lazy,dog's 被处理为 fox,jump,lazi,dog
结论:与Solr不同,solr的 text类型是默认会把text 进行分词化的,es则不会
##### 自建filter
Appbase.io 建立index是将自定义analyzer 写入settings 中
```javascript
PUT appbase_test
{
"settings": {
"analysis": {
"analyzer": {
"autosuggest_analyzer": {
"filter": ["lowercase", "asciifolding", "autosuggest_filter"],
"tokenizer": "standard",
"type": "custom"
},
"english_analyzer": {
"filter": ["lowercase", "asciifolding", "porter_stem"],
"tokenizer": "standard",
"type": "custom"
},
"ngram_analyzer": {
"filter": ["lowercase", "asciifolding", "ngram_filter"],
"tokenizer": "standard",
"type": "custom"
}
},
"filter": {
"autosuggest_filter": {
"max_gram": "20",
"min_gram": "1",
"token_chars": ["letter", "digit", "punctuation", "symbol"],
"type": "edge_ngram"
},
"ngram_filter": {
"max_gram": "9",
"min_gram": "2",
"token_chars": ["letter", "digit", "punctuation", "symbol"],
"type": "ngram"
}
}
},
"max_ngram_diff": "8",
"max_shingle_diff": "8"
},
"mappings": {
"properties": {
"text1": {
"analyzer": "standard",
"fields": {
"autosuggest": {
"analyzer": "autosuggest_analyzer",
"search_analyzer": "simple",
"type": "text"
},
"english": {
"analyzer": "english_analyzer",
"type": "text"
},
"search": {
"analyzer": "ngram_analyzer",
"search_analyzer": "simple",
"type": "text"
}
},
"type": "text"
}
}
}
}
使用minimum_should_match进行相似度模糊匹配
POST appbase_test/_doc
{
"text1":"theeventsfooddrinks12"
}
POST appbase_test/_search
{
"query": {
"match": {
"text1.search": {
"query":"eventsfooddrinks",
"minimum_should_match": "60%"
}
}
}
}