我有以下嵌套的json文件,我想用jq工具进行解析,并以表格形式打印,就像我最后显示的那样
input.json结构如下:
{
"document":{
"page":[
{
"@index":"0",
"image":{
"@data":"ABC",
"@format":"png",
"@height":"620.00",
"@type":"base64encoded",
"@width":"450.00",
"@x":"85.00",
"@y":"85.00"
}
},
{
"@index":"1",
"row":[
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text1",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"71.04",
"@x":"121.10",
"@y":"83.42"
}
}
]
},
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text2",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"101.07",
"@x":"121.10",
"@y":"124.82"
}
}
]
}
]
},
{
"@index":"2",
"row":[
{
"column":{
"text":{
"#text":"Text3",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"363.44",
"@x":"85.10",
"@y":"69.62"
}
}
},
{
"column":{
"text":{
"#text":"Text4",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"382.36",
"@x":"85.10",
"@y":"83.42"
}
}
},
{
"column":{
"text":{
"#text":"Text5",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"435.05",
"@x":"85.10",
"@y":"97.22"
}
}
}
]
},
{
"@index":"3"
}
]
}
}
以下问题(Parsing nested json with jq)的答案后面,我已经尝试了此代码,但不起作用
$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv
我想要获得的输出是:
#text @x @y
Text1 121.10 83.42
Text2 121.10 124.82
Text3 65.10 69.62
Text4 85.10 83.42
Text5 85.10 97.22
如何实现?
谢谢
更新
非常感谢您的帮助。我尝试使用真实文件的时间更长了。
我能够像下面那样适应第一个峰的解决方案:
["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"],
( ..
| objects
| select(has("#text","@data"))
| [.["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
)
| @tsv
使用新的输入,我得到此表:
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| #text | @data | @fontName | @fontSize | @format | @height | @type | @width | @x | @y |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| | ABC | | | png | 620 | base64encoded | 450 | 85 | 85 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ä 1 | | Tahoma | 12 | | 12 | | 427.79 | 85.1 | 69.62 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ¢76 | | Tahoma | 12 | | 12 | | 270.5 | 85.1 | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text % 5 | | Tahoma | 12 | | 12 | | 130.84 | 358.86 | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 7Ç8 | | Tahoma | 12 | | 12 | | 115.95 | 85.1 | 704.52 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text • 2 Wñ79 | | Tahoma | 8 | | 8.04 | | 398.16 | 121.1 | 68.06 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text | | Tahoma | 12 | | 12 | | 101.5 | 85.1 | 83.42 |
| » 1 A\\\\CÓ | | | | | | | | | |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 12 | | Tahoma | 12 | | 12 | | 312.26 | 189.83 | 83.42 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 82 | | Tahoma | 12 | | 12 | | 44.99 | 85.1 | 97.22 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 31 | | Tahoma | 8 | | 8.04 | | 381.83 | 133.1 | 95.66 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
如果可能,如何添加以下3列(计数器,页面和行)以了解每一行对应的页面和行?
预期输出如下:
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| counter | page | row | #text | @data | @fontName | @fontSize | @format | @height | @type | @width | @x | @y |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 1 | 0 | | | ABC | | | png | 620 | base64encoded | 450 | 85 | 85 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 2 | 1 | 0 | Text ä 1 | | Tahoma | 12 | | 12 | | 427.79 | 85.1 | 69.62 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 3 | 1 | 1 | Text ¢76 | | Tahoma | 12 | | 12 | | 270.5 | 85.1 | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 4 | 1 | 1 | Text % 5 | | Tahoma | 12 | | 12 | | 130.84 | 358.86 | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 5 | 2 | 2 | Text 7Ç8 | | Tahoma | 12 | | 12 | | 115.95 | 85.1 | 704.52 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 6 | 2 | 0 | Text • 2 Wñ79 | | Tahoma | 8 | | 8.04 | | 398.16 | 121.1 | 68.06 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 7 | 2 | 1 | Text » 1 A\\\\CÓ | | Tahoma | 12 | | 12 | | 101.5 | 85.1 | 83.42 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 8 | 2 | 1 | Text 12 | | Tahoma | 12 | | 12 | | 312.26 | 189.83 | 83.42 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 9 | 2 | 2 | Text 82 | | Tahoma | 12 | | 12 | | 44.99 | 85.1 | 97.22 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 10 | 2 | 2 | Text 31 | | Tahoma | 8 | | 8.04 | | 381.83 | 133.1 | 95.66 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
这是新的更具代表性的输入文件input2.json。
然后看到下图中的Json结构,就可以了解json文件中存在的page
号和row
号以及其中的值。
答案 0 :(得分:2)
这是一种简单的方法(也许太简单了?),该方法着重于具有“ #text”属性的嵌入式JSON对象:
["#text", "@x", "@y"], # the header
( ..
| objects
| select(has("#text"))
| [.["#text", "@x", "@y"]] # a row
)
| @csv
给定该程序和示例输入后,使用-r选项调用jq
将会产生:
"#text","@x","@y"
"Text1","121.10","83.42"
"Text2","121.10","124.82"
"Text3","85.10","69.62"
"Text4","85.10","83.42"
"Text5","85.10","97.22"
如果您不希望使用引号,并且愿意冒输出严格不是CSV的风险,那么一种选择是在管道的最后使用join(",")
而不是@csv
您可能想使用@tsv
而不是@csv
。
如果需要一种限制性更强的方法来选择相关的嵌入式对象,则用..
替换.. | .text?
就足够了。
如果没有,则可以根据详细要求添加其他过滤器。
答案 1 :(得分:1)
这是一种使用“向下钻取”的解决方案,因此非常乏味:
["#text", "@x", "@y"],
( .document.page[]
| .row[]?
| .column
| (if type == "array" then .[] else . end)
| .text
| objects
| [.["#text", "@x", "@y"]]
)
| @tsv
这将与-r命令行选项一起使用。
我使用@tsv
是因为它产生类似于给定预期输出的输出。如本页其他地方所述,还有其他替代方法,例如使用join/1
。
答案 2 :(得分:1)
对于那些对替代解决方案感兴趣的人,以下是使用针对JSON的步行路径unix工具实现相同要求的方法: jtc
。
bash $ jtc -qq -w'<>a' -T'"#text\t@x\t@y"' -w'<@x>l:<x>v[-1][@y]<y>v[-1][#text]' -T'"{}\t{x}\t{y}"' file.json
#text @x @y
Text1 121.10 83.42
Text2 121.10 124.82
Text3 85.10 69.62
Text4 85.10 83.42
Text5 85.10 97.22
bash $
步行路径(-w
)细分:
<@x>l: <x>v
找到每个标签@x
,并在命名空间x
中记住找到的JSON值[-1][@y]<y>v
(从上次找到的值开始)寻址父级,然后通过标签@y
寻址JSON并将其值存储在命名空间y
[-1][#text]
对#text
标签执行相同操作(注意:不记住最后一个值)--T'"{}\t{x}\t{y}"'
:带插值的应用模板({}
将插值最后找到的值,因此无需将其存储在名称空间中)
--qq
将取消报价所生成的JSON字符串(删除引号并将\t
转换为选项卡)
-第一步(-w'<>a'
)只是一个虚拟的触发器,可以触发标题行的模板插值。
PS>披露:我是jtc
-用于JSON操作的shell cli工具的创建者
答案 3 :(得分:1)
由于与input2.json对应的第二组要求中需要一些上下文相关信息,因此不能忽略上下文,因此以下解决方案使用“向下钻取”方法。除非您了解foreach
,否则以下内容将很难理解,因此,我只想提一下,该方法本质上使用状态变量{counter,page,row}来跟踪这三个计数器。
["counter", "page", "row", "#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"],
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
.page += 1
| foreach ($page | .row[]?) as $row (.row=-1;
.row += 1
| foreach ($row | (.column | (if type == "array" then .[] else . end )) | .text | objects) as $x (.;
.counter += 1
| .out = [.counter, .page, .row, $x["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
; . )
; . )
; .out )
)
| @tsv
这将产生所需的TSV,但第一行数据除外,因为该行没有行。在Relate elements in table form from Json file with jq
的答案中显示了包含第一行的一种方法答案 4 :(得分:-1)
在此命令中:
$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv
jq
之后的所有内容都应作为jq
的第一个参数,这意味着您需要将其用引号引起来。此外,cat file.json |
在这里是Useless Use of Cat;只需将文件名作为参数传递给jq
。因此,正确的命令是:
$ jq '.document.page[].row | ["#text", "@x", "@y"] | @csv' file.json