Elasticsearch:在索引数据上应用小写

时间:2017-04-18 15:21:54

标签: python elasticsearch lucene

我在弹性搜索中索引了文档。示例文档如下所示:

{
    "_index": "processed_tweets",
    "_type": "processed",
    "_id": "830403820580663296",
    "_score": 1,
    "_source": {
      "at": [
        "@LouisDasch"
      ],
      "original_tweet_id": "830398288352403457",
      "id_str": "830403820580663296",
      "trigrams": [
        "blessed lourdes lady",
        "lourdes lady feast",
        "lady feast day",
        "feast day wishing"
      ],
      "hashtags": [
        "#Catholic"
      ],
      "id_tweet_creator": "487735029",
      "tokens": [
        "blessed",
        "lourdes",
        "lady",
        "feast",
        "day",
        "wishing"
      ],
      "bigrams": [
        "blessed lourdes",
        "lourdes lady",
        "lady feast",
        "feast day",
        "day wishing"
      ],
      "retweeted": true
    }
  }

我想小写字段中存在的所有主题标签" hashtags"对于我索引的所有文件。 例如我会: " hashtags":["#Catholic"] - > " hashtags":[" #catholic"] 将每个关键字更新为小写等价物的最佳方式(减少时间消耗)是什么(保存"#")?

1 个答案:

答案 0 :(得分:0)

如果您使用的是ES 5.0及更高版本,则会有一种名为"无痛的脚本语言"这是介绍。这可以帮助您更新字段。它的处理速度非常快。

请查看以下链接以获取更多信息。

https://www.elastic.co/guide/en/elasticsearch/reference/5.0/modules-scripting-painless.html