解析elasticsearch索引和抓取索引值的文本

时间:2013-03-22 20:38:28

标签: regex perl xml-parsing elasticsearch

在下面的部分中,我需要为每个部分选择输出的第一个条目,而这些条目又是ElasticSearch的索引名称。

例如nprod @ n_docs,platform-api-stage,nprod @ janeuk_classic,nprod @ delista.com @ 1

所以我知道它们介于像

这样的字符模式之间
  

{“

  

:{       “设置”:{

那么我的脚本会如何获取这些值,以便将它们转移到另一个文件中?

我的输出如下:

{
  "nprod@n_docs" : {
    "settings" : {
      "index.analysis.analyzer.rwn_text_analyzer.char_filter" : "html_strip",
      "index.analysis.analyzer.rwn_text_analyzer.language" : "English",
      "index.translog.disable_flush" : "false",
      "index.version.created" : "190199",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "5",
      "index.analysis.analyzer.rwn_text_analyzer.type" : "snowball",
      "index.translog.flush_threshold_size" : "60",
      "index.translog.flush_threshold_period" : "",
      "index.translog.flush_threshold_ops" : "500"
    }
  },
  "platform-api-stage" : {
    "settings" : {
      "index.analysis.analyzer.api_edgeNGram.type" : "custom",
      "index.analysis.analyzer.api_edgeNGram.filter.0" : "api_nGram",
      "index.analysis.filter.api_nGram.max_gram" : "50",
      "index.analysis.analyzer.api_edgeNGram.filter.1" : "lowercase",
      "index.analysis.analyzer.api_path.type" : "custom",
      "index.analysis.analyzer.api_path.tokenizer" : "path_hierarchy",
      "index.analysis.filter.api_nGram.min_gram" : "2",
      "index.analysis.filter.api_nGram.type" : "edgeNGram",
      "index.analysis.analyzer.api_edgeNGram.tokenizer" : "standard",
      "index.analysis.filter.api_nGram.side" : "front",
      "index.analysis.analyzer.api_path.filter.0" : "lowercase",
      "index.number_of_shards" : "5",
      "index.number_of_replicas" : "1",
      "index.version.created" : "200599"
    }
  },
  "nprod@janeuk_classic" : {
    "settings" : {
      "index.analysis.analyzer.n_text_analyzer.language" : "English",
      "index.translog.disable_flush" : "false",
      "index.version.created" : "190199",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "5",
      "index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
      "index.analysis.analyzer.n_text_analyzer.type" : "snowball",
      "index.translog.flush_threshold_size" : "60",
      "index.translog.flush_threshold_period" : "",
      "index.translog.flush_threshold_ops" : "500"
    }
  },
  "nprod@delista.com@1" : {
    "settings" : {
      "index.analysis.analyzer.n_text_analyzer.language" : "English",
      "index.translog.disable_flush" : "false",
      "index.version.created" : "191199",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "5",
      "index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
      "index.analysis.analyzer.n_text_analyzer.type" : "snowball",
      "index.translog.flush_threshold_size" : "60",
      "index.translog.flush_threshold_period" : "",
      "index.translog.flush_threshold_ops" : "500"
    }
  },

1 个答案:

答案 0 :(得分:3)

那是JSON。阅读数据并使用JSON::XS解析它。

use JSON::XS qw( decode_json );

my $file;
{ 
   open(my $fh, '<:raw', $qfn)
      or die("Can't open \"$qfn\": $!\n");
   local $/;
   $file = <$fh>;
}

my $data = decode_json($file);

然后,只需遍历树以获取所需信息。

my @index_names = keys(%$data);