如何使用stopword elasticsearch

时间:2015-04-13 03:56:09

标签: elasticsearch token

我的服务器上运行了Elasticsearch 1.5,

具体来说,我希望/创建三个字段是

1.name

4.产品说明

3.nickname

当我在Elasticsearch上插入数据然后自动删除不需要的停用词时,我想要设置停用词作为描述和昵称字段。我正在尝试这么多时间,但没有工作。

curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
  "settings": {
    "analysis": {
      "filter": {
        "custom_english_stemmer": {
          "type": "stemmer",
          "name": "english"
        },
        "snowball": {
          "type" : "snowball",
          "language" : "English"
                }
      },
      "analyzer": {
        "custom_lowercase_stemmed": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "custom_english_stemmer",
            "snowball"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
    "_all" : {"enabled" : true},
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "custom_lowercase_stemmed"
        }
      }
    }
  }
}'

curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
  "text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'

curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
    "query_string": {
        "query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
        "fields": ["text"]
    }
  }
}'

2 个答案:

答案 0 :(得分:1)

将您的分析仪部件更改为

"analyzer": {
    "custom_lowercase_stemmed": {
      "tokenizer": "standard",
      "filter": [
        "stop",
        "lowercase",
        "custom_english_stemmer",
        "snowball"
      ]
    }
  }

要验证更改,请使用

curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'

并观察代币

{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%

PS:如果您不想获得测试的词干版本,请删除词干过滤器。

答案 1 :(得分:0)

您需要在分析器过滤器链中使用stop token filter