我有一个包含json行的文件,需要根据每个json" alert.status"的顺序验证其有效性。值。
有效json行的示例:
{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
上述文件有效,因为重复的jsons(第1,5行和第2,6行)的状态从" on"," off"," on& #34;等等。
无效json行的示例:
{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
上述内容无效,因为第1行和第3行中的jsons是重复的,具有" status"价值保持不变,不会打开或关闭。
我尝试使用jq将json行读入json数组
jq --slurp 'map(select(. >= 2))' jsonfile > jsonarray
但由于每行中的序列很重要,我不认为我可以使用group_by来查找重复项(group_by&#39的结果已经过排序)。
我正考虑在每个json中插入一个带有增量编号的新密钥,因此在使用group_by之后,我们可以根据这个新密钥对结果进行排序以获取序列。
除了两个键之外,jq中有没有办法使用group by all? (在这种情况下" status"以及带增量编号的新密钥)。
有没有更好的方法来解决这个问题?
非常感谢你的帮助!
答案 0 :(得分:3)
我认为我不能使用group_by来查找重复项(group_by的结果已经排序)。
这是正确的,但是很容易定义一个非排序的“group_by”,正如我们所看到的,除了特别指定的键之外,它还可以很容易地用于按所有键排序。
首先,这是一个简单的过滤器,它保留了每个组中项目的原始顺序:
# The filter, f, must produce a string for each item in `stream`
def GROUPS_BY(stream; f):
reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
名称中的“S”强调该函数是面向流的,首先是第一个参数是一个流,第二个是函数产生一个组的流;这个名称是大写的,以强调与现有内置功能的差异。
为了说明如何使用除特定键以外的所有键进行分组,请考虑此示例(取自另一个SO问题):
def data:
[{"foo":1,"bar":"a","baz":"whatever"},
{"foo":1,"bar":"a","baz":"hello"},
{"foo":1,"bar":"b","baz":"world"}] ;
GROUPS_BY(data[]; del(.baz) | tostring)
[{"foo":1,"bar":"a","baz":"whatever"},{"foo":1,"bar":"a","baz":"hello"}]
[{"foo":1,"bar":"b","baz":"world"}]
可能有人反对要求f始终是字符串值引入了几个潜在的困难,所以这里有一个有效但更通用的定义:
# Emit a stream of the groups defined by f, without using sort.
# f need not be string-valued.
def GROUPS_BY(stream; f):
reduce stream as $x ({};
($x|f) as $s
| ($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y] += [$x] )
| .[][]
;
现在我们可以简单地写一下:
GROUPS_BY(data[]; del(.baz))
将GROUPS_BY
与JSON-Lines文件一起使用的最简单方法是使用inputs
,例如假设使用了更多功能的def,你会写:
GROUPS_BY(inputs; del(.alert))
使用inputs
时,不要忘记使用-n选项调用jq。
根据我对该问题的理解,可以使用以下过滤器来确定组的有效性:
def changing(f):
def c:
if length <= 1 then true
elif (.[0] | f) == (.[1] | f) then false
else .[1:] | c
end;
c ;
(内部函数c在这里用于有效的递归。当然,如果冗长地计算f
是一个问题,那么应该使用变体定义。)
完全使用GROUPS_BY的更多功能定义, 并且假设我们希望识别无效组,解决方案似乎是两个方面:
GROUPS_BY(inputs; del(.alert))
| select( changing(.alert.status) | not )
答案 1 :(得分:1)
我会用更方便的编程语言来完成这项工作,比如Python:
#!/usr/bin/env python
import json
import sys
感谢https://stackoverflow.com/a/38373810/171318
def load_json_multiple(segments):
chunk = ""
for segment in segments:
chunk += segment
try:
yield json.loads(chunk)
chunk = ""
except ValueError:
pass
...和main()
方法:
def main():
lookup = {}
# You might wanna use argparse for this in real life
filename = sys.argv[1]
with open(filename) as f:
for parsed_json in load_json_multiple(f):
key = '{}/{}/{}'.format(parsed_json['id'],
parsed_json['code'],
parsed_json['severityCode'])
status = True if parsed_json['alert']['status'] == 'On' else False
if key in lookup and lookup[key] != (not status):
print('invalid')
return 1
lookup[key] = status
print('valid')
return 0
if __name__ == '__main__':
sys.exit(main())
将其存储在一个文件中,让我们说 validate.py ,chmod +x
并将其称为:
./validate.py valid.json
./validate.py invalid.json
...