使用JQ解析JSON行以按顺序拍打键值

时间:2018-01-05 00:21:06

标签: json jq

我有一个包含json行的文件,需要根据每个json" alert.status"的顺序验证其有效性。值。

有效json行的示例:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}

上述文件有效,因为重复的jsons(第1,5行和第2,6行)的状态从" on"," off"," on& #34;等等。

无效json行的示例:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}

上述内容无效,因为第1行和第3行中的jsons是重复的,具有" status"价值保持不变,不会打开或关闭。

我尝试使用jq将json行读入json数组

jq --slurp 'map(select(. >= 2))' jsonfile > jsonarray

但由于每行中的序列很重要,我不认为我可以使用group_by来查找重复项(group_by&#39的结果已经过排序)。

我正考虑在每个json中插入一个带有增量编号的新密钥,因此在使用group_by之后,我们可以根据这个新密钥对结果进行排序以获取序列。

除了两个键之外,jq中有没有办法使用group by all? (在这种情况下" status"以及带增量编号的新密钥)。

有没有更好的方法来解决这个问题?

非常感谢你的帮助!

2 个答案:

答案 0 :(得分:3)

  

我认为我不能使用group_by来查找重复项(group_by的结果已经排序)。

这是正确的,但是很容易定义一个非排序的“group_by”,正如我们所看到的,除了特别指定的键之外,它还可以很容易地用于按所有键排序。

GROUPS_BY

首先,这是一个简单的过滤器,它保留了每个组中项目的原始顺序:

# The filter, f, must produce a string for each item in `stream`
def GROUPS_BY(stream; f):
  reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

名称中的“S”强调该函数是面向流的,首先是第一个参数是一个流,第二个是函数产生一个组的流;这个名称是大写的,以强调与现有内置功能的差异。

实施例

为了说明如何使用除特定键以外的所有键进行分组,请考虑此示例(取自另一个SO问题):

def data:
  [{"foo":1,"bar":"a","baz":"whatever"},
   {"foo":1,"bar":"a","baz":"hello"},
   {"foo":1,"bar":"b","baz":"world"}] ;

GROUPS_BY(data[]; del(.baz) | tostring)

输出

[{"foo":1,"bar":"a","baz":"whatever"},{"foo":1,"bar":"a","baz":"hello"}]
[{"foo":1,"bar":"b","baz":"world"}]

细化

可能有人反对要求f始终是字符串值引入了几个潜在的困难,所以这里有一个有效但更通用的定义:

# Emit a stream of the groups defined by f, without using sort.
# f need not be string-valued.
def GROUPS_BY(stream; f): 
   reduce stream as $x ({};
     ($x|f) as $s
     | ($s|type) as $t
     | (if $t == "string" then $s else ($s|tojson) end) as $y
     | .[$t][$y] += [$x] )
   | .[][]
   ;

现在我们可以简单地写一下:

GROUPS_BY(data[]; del(.baz))

使用JSON-Lines文件

GROUPS_BY与JSON-Lines文件一起使用的最简单方法是使用inputs,例如假设使用了更多功能的def,你会写:

GROUPS_BY(inputs; del(.alert))

使用inputs时,不要忘记使用-n选项调用jq。

确定有效性的过滤器

根据我对该问题的理解,可以使用以下过滤器来确定组的有效性:

def changing(f):
  def c:
    if length <= 1 then true
    elif (.[0] | f) == (.[1] | f) then false
    else .[1:] | c
    end;
  c ;

(内部函数c在这里用于有效的递归。当然,如果冗长地计算f是一个问题,那么应该使用变体定义。)

解决方案

完全使用GROUPS_BY的更多功能定义, 并且假设我们希望识别无效组,解决方案似乎是两个方面:

GROUPS_BY(inputs; del(.alert))
| select( changing(.alert.status) | not )

答案 1 :(得分:1)

我会用更方便的编程语言来完成这项工作,比如Python:

#!/usr/bin/env python
import json
import sys 

感谢https://stackoverflow.com/a/38373810/171318

def load_json_multiple(segments):
    chunk = ""
    for segment in segments:
        chunk += segment
        try:
            yield json.loads(chunk)
            chunk = ""
        except ValueError:
            pass

...和main()方法:

def main():

    lookup = {}
    # You might wanna use argparse for this in real life
    filename = sys.argv[1]
    with open(filename) as f:
        for parsed_json in load_json_multiple(f):
            key = '{}/{}/{}'.format(parsed_json['id'],
                                    parsed_json['code'],
                                    parsed_json['severityCode'])

            status = True if parsed_json['alert']['status'] == 'On' else False
            if key in lookup and lookup[key] != (not status):
                print('invalid')
                return 1

            lookup[key] = status

    print('valid')
    return 0


if __name__ == '__main__':
    sys.exit(main())

将其存储在一个文件中,让我们说 validate.py chmod +x并将其称为:

./validate.py valid.json
./validate.py invalid.json
...