Question

我希望从多个单词中获取一个由两个单词组合而来的频率例如＆＃34;绿色能源＆＃34;

我可以访问＆＃34;绿色＆＃34;和＆＃34;能源＆＃34;，例如：

"function_score":
{
    "filter" : {
        "terms" : { "content" : ["energy","green"]}
    },
    "script_score": {
        "script": "_index['content']['energy'].tf() + _index['content']['green'].tf()",
        "lang":"groovy"
    }
}

这很好用。但是，我怎样才能找到一个术语的频率＆＃34;绿色能源＆＃34;如

_index['content']['green energy'].tf()

不起作用

Answer 1

我认为这取决于您的数据索引方式以及搜索时的要求。例如，如果你有＆＃34;间接绿色能源备用＆＃34; （意思是，＆＃34;绿色＆＃34;和＆＃34;能量＆＃34;彼此相近），你希望你的脚本能够匹配＆＃34;为了绿色能源＆＃34;并给你一个tf（）评估，然后你需要相应地索引你的数据。就像你说的那样 - ＆＃34;术语的频率＆＃39;绿色能源＆＃39;＆＃34;归结为以某种方式产生这个术语＆＃34;绿色能源＆＃34;。

在您的情况下，一个想法是使用"content"的另一个字段，但使用"shingles"分析器：

PUT /some_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 2,
          "output_unigrams": false
        }
      },
      "analyzer": {
        "my_shingle_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_shingle_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "some_type": {
      "properties": {
        "content": {
          "type": "string",
          "index": "analyzed",
          "fields": {
            "with_shingles": {
              "type": "string",
              "analyzer": "my_shingle_analyzer"
            }
          }
        }
      }
    }
  }
}

在您的功能评分中，您会引用.with_shingles字段：

{
  "query": {
    "function_score": {
      "filter": {
        "terms": {
          "content": [
            "energy",
            "green"
          ]
        }
      },
      "script_score": {
        "script": "_index['content.with_shingles']['green energy'].tf()",
        "lang": "groovy"
      }
    }
  }
}

这只是一个示例，可以证明您需要相应地索引数据，以便获得所需的.tf()。在我的例子中，我假设你想要搜索确切的术语＆＃34;绿色能源＆＃34;所以我用了＃34;带状疱疹＆＃34;对于上面的示例文本，会生成一个分析的术语列表，如下所示："content.with_shingles": ["energy to","green energy","indirect green","to spare"]。

elasticsearch phze term frequency .tf（）包含多个单词

1 个答案: