说明

自建analyzer

autosuggest_analyzer

n ```使用带词边界的n-gram算法将text分解为下面token后的term 进一步进行“临接拆分“, 进一步分解为最小长度为1,最大长度为20的term 例如: 文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." 一般情况下,text类型 默认使用standard analyzer 分析成下面token:
the 2 quick brown foxes jumped over the lazy dog's bone autosuggest_filter 会分解为 t th the 2 q qu qui quic quick ...

结论:这个自建的auto_suggest,虽然可以满足suggester功能,但是返回结果排序不理想 autocompletion功能的话还是建议使用completion类型


##### ngram_analyzer

使用不带词边界的n-gram算法将text 分解后的token进一步进行“临接拆分“, 进一步分解为最小长度为2,最大长度为9的token。 文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."分析后结果: th the he qu qui quic qucik ui uic uick ic ick ck

结论:es自己内建了一个ngram tokenizer,看效果和自建的ngram_analyzer效果差不多,应该可以直接用内建的ngram tokenize


##### english_analyzer

使用Porter分词法,对text字段分词后的token 进行term(分词处理)

文本 "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."分析后结果: the quick brown fox jump over the lazi dog bone 其中 foxes,jumped,lazy,dog's 被处理为 fox,jump,lazi,dog

结论:与Solr不同,solr的 text类型是默认会把text 进行分词化的,es则不会


##### 自建filter



Appbase.io 建立index是将自定义analyzer 写入settings 中
```javascript
PUT appbase_test
{
    "settings": {
            "analysis": {
                "analyzer": {
                    "autosuggest_analyzer": {
                        "filter": ["lowercase", "asciifolding", "autosuggest_filter"],
                        "tokenizer": "standard",
                        "type": "custom"
                    },
                    "english_analyzer": {
                        "filter": ["lowercase", "asciifolding", "porter_stem"],
                        "tokenizer": "standard",
                        "type": "custom"
                    },
                    "ngram_analyzer": {
                        "filter": ["lowercase", "asciifolding", "ngram_filter"],
                        "tokenizer": "standard",
                        "type": "custom"
                    }
                },
                "filter": {
                    "autosuggest_filter": {
                        "max_gram": "20",
                        "min_gram": "1",
                        "token_chars": ["letter", "digit", "punctuation", "symbol"],
                        "type": "edge_ngram"
                    },
                    "ngram_filter": {
                        "max_gram": "9",
                        "min_gram": "2",
                        "token_chars": ["letter", "digit", "punctuation", "symbol"],
                        "type": "ngram"
                    }
                }
            },
            "max_ngram_diff": "8",
            "max_shingle_diff": "8"
        },
        "mappings": {
          "properties": {
        "text1": {
            "analyzer": "standard",
            "fields": {
                "autosuggest": {
                    "analyzer": "autosuggest_analyzer",
                    "search_analyzer": "simple",
                    "type": "text"
                },
                "english": {
                    "analyzer": "english_analyzer",
                    "type": "text"
                },
                "search": {
                    "analyzer": "ngram_analyzer",
                    "search_analyzer": "simple",
                    "type": "text"
                }
            },
            "type": "text"
        }
          }
        }
}

使用minimum_should_match进行相似度模糊匹配

POST appbase_test/_doc
{
  "text1":"theeventsfooddrinks12"
}


POST appbase_test/_search
{
  "query": {
    "match": {
      "text1.search": {
        "query":"eventsfooddrinks",
        "minimum_should_match": "60%"
      }
    }
  }
}

results matching ""

    No results matching ""