Question

我正在使用格式如下的工具（massdns）的JSON输出：

{"query_name":"1eaff.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1cf0e.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1cf0e.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1fwsjz2f4ok1ot2hh2illyd1-wpengine.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1fwsjz2f4ok1ot2hh2illyd1-wpengine.example.com.","query_type":"A","resp_name":"ns02.example.com.","resp_type":"A","data":"<ip>"}
{"query_name":"1a811.example.com.","query_type":"A","resp_name":"ns01.example.com.","resp_type":"A","data":"<ip>"}

我可以将jq与slurp（-s）配合使用，以所需的格式精美地输出结果：

jq -s '{ a: "xxx", "b": 123, domains: map(select(.resp_type=="A") | .resp_name[:-1] ) | unique }'

这将产生一个JSON字符串，例如：

{
  "a": "xxx",
  "b": 123,
  "domains": [
    "ns01.example.com",
    "ns02.example.com"
  ]
}

（请参阅JQPlay example here。）

当我的输入扩展到成千上万的行（GB的数据）时，就会出现问题，在这种情况下，slurp会占用太多内存，并且jq会退出并出现错误。

我发现了--stream选项，该选项可以处理大量输入，但是正在努力寻找一种方法来获取相同的输出。是否可以使用--stream（而不是--slurp）来使用jq获得非常大的输入文件的所需输出？

Answer 1

$or: [ {"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924024, "Reference": "C", "Mutation": "G"} {"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924533, "Reference": "A", "Mutation": "G"} ]会使此任务过于复杂，请结合使用--stream选项和--null-input/-n。

reduce

将对象中的域保留为键而不是数组，使此脚本在内存消耗和cpu时间方面更加高效；在jq中，对象是通过合并添加的，即将两个对象的所有键值对插入单个组合的对象中。如果两个对象都包含同一个键的值，则{a: "xxx", b: 123} | .domains = (reduce (inputs|select(.query_type == "A").resp_name) as $d ({}; . + {($d): null}) | keys_unsorted | map(.[:-1]))右边的对象将获胜。因此无需+。

将所有unique的最后一个字符（.[:-1]）修剪掉也会减慢该过程，而对结果数组执行resp_name则效率更高。

在jqplay上查看。

处理非常大的输入文件而不会出现

1 个答案: