使用jq工具解析Json文件

时间:2019-05-30 02:43:51

标签: json jq

我有以下嵌套的json文件,我想用jq工具进行解析,并以表格形式打印,就像我最后显示的那样

input.json结构如下:

{
 "document":{
  "page":[
     {
        "@index":"0",
        "image":{
           "@data":"ABC",
           "@format":"png",
           "@height":"620.00",
           "@type":"base64encoded",
           "@width":"450.00",
           "@x":"85.00",
           "@y":"85.00"
        }
     },
     {
        "@index":"1",
        "row":[
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text1",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"71.04",
                       "@x":"121.10",
                       "@y":"83.42"
                    }
                 }
              ]
           },
           {
              "column":[
                 {
                    "text":""
                 },
                 {
                    "text":{
                       "#text":"Text2",
                       "@fontName":"Arial",
                       "@fontSize":"12.0",
                       "@height":"12.00",
                       "@width":"101.07",
                       "@x":"121.10",
                       "@y":"124.82"
                    }
                 }
              ]
           }
        ]
     },
     {
        "@index":"2",
        "row":[
           {
              "column":{
                 "text":{
                    "#text":"Text3",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"363.44",
                    "@x":"85.10",
                    "@y":"69.62"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text4",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"382.36",
                    "@x":"85.10",
                    "@y":"83.42"
                 }
              }
           },
           {
              "column":{
                 "text":{
                    "#text":"Text5",
                    "@fontName":"Arial",
                    "@fontSize":"12.0",
                    "@height":"12.00",
                    "@width":"435.05",
                    "@x":"85.10",
                    "@y":"97.22"
                 }
              }
           }
        ]
     },
     {
        "@index":"3"
     }
  ]
 }
}

以下问题(Parsing nested json with jq)的答案后面,我已经尝试了此代码,但不起作用

$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv

我想要获得的输出是:

#text @x     @y
Text1 121.10 83.42
Text2 121.10 124.82
Text3 65.10  69.62
Text4 85.10  83.42
Text5 85.10  97.22

如何实现?

谢谢

更新

非常感谢您的帮助。我尝试使用真实文件的时间更长了。

我能够像下面那样适应第一个峰的解决方案:

["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
( .. 
| objects 
| select(has("#text","@data")) 
| [.["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
)  
| @tsv

使用新的输入,我得到此表:

+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| #text         | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
|               | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ä 1      |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text ¢76      |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text % 5      |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 7Ç8      |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text • 2 Wñ79 |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text          |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
|   » 1 A\\\\CÓ |       |           |           |         |         |               |        |        |        |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 12       |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 82       |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| Text 31       |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+---------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

如果可能,如何添加以下3列(计数器,页面和行)以了解每一行对应的页面和行?

预期输出如下:

+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| counter | page | row | #text             | @data | @fontName | @fontSize | @format | @height | @type         | @width | @x     | @y     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 1     | 0    |     |                   | ABC   |           |           | png     | 620     | base64encoded | 450    | 85     | 85     |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 2     | 1    | 0   | Text ä 1          |       | Tahoma    | 12        |         | 12      |               | 427.79 | 85.1   | 69.62  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 3     | 1    | 1   | Text ¢76          |       | Tahoma    | 12        |         | 12      |               | 270.5  | 85.1   | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 4     | 1    | 1   | Text % 5          |       | Tahoma    | 12        |         | 12      |               | 130.84 | 358.86 | 690.72 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 5     | 2    | 2   | Text 7Ç8          |       | Tahoma    | 12        |         | 12      |               | 115.95 | 85.1   | 704.52 |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 6     | 2    | 0   | Text • 2 Wñ79     |       | Tahoma    | 8         |         | 8.04    |               | 398.16 | 121.1  | 68.06  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 7     | 2    | 1   | Text  » 1 A\\\\CÓ |       | Tahoma    | 12        |         | 12      |               | 101.5  | 85.1   | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 8     | 2    | 1   | Text 12           |       | Tahoma    | 12        |         | 12      |               | 312.26 | 189.83 | 83.42  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 9     | 2    | 2   | Text 82           |       | Tahoma    | 12        |         | 12      |               | 44.99  | 85.1   | 97.22  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+
| 10    | 2    | 2   | Text 31           |       | Tahoma    | 8         |         | 8.04    |               | 381.83 | 133.1  | 95.66  |
+-------+------+-----+-------------------+-------+-----------+-----------+---------+---------+---------------+--------+--------+--------+

这是新的更具代表性的输入文件input2.json

然后看到下图中的Json结构,就可以了解json文件中存在的page号和row号以及其中的值。

enter image description here

5 个答案:

答案 0 :(得分:2)

这是一种简单的方法(也许太简单了?),该方法着重于具有“ #text”属性的嵌入式JSON对象:

["#text", "@x", "@y"],       # the header
( ..
  | objects
  | select(has("#text"))  
  | [.["#text", "@x", "@y"]] # a row
) 
| @csv

给定该程序和示例输入后,使用-r选项调用jq将会产生:

"#text","@x","@y"
"Text1","121.10","83.42"
"Text2","121.10","124.82"
"Text3","85.10","69.62"
"Text4","85.10","83.42"
"Text5","85.10","97.22"

如果您不希望使用引号,并且愿意冒输出严格不是CSV的风险,那么一种选择是在管道的最后使用join(",")而不是@csv

变量

您可能想使用@tsv而不是@csv

如果需要一种限制性更强的方法来选择相关的嵌入式对象,则用..替换.. | .text?就足够了。

如果没有,则可以根据详细要求添加其他过滤器。

答案 1 :(得分:1)

这是一种使用“向下钻取”的解决方案,因此非常乏味:

["#text", "@x", "@y"],
( .document.page[]
  | .row[]?
  | .column
  | (if type == "array" then .[] else . end)
  | .text
  | objects
  | [.["#text", "@x", "@y"]]
)
| @tsv

这将与-r命令行选项一起使用。

我使用@tsv是因为它产生类似于给定预期输出的输出。如本页其他地方所述,还有其他替代方法,例如使用join/1

答案 2 :(得分:1)

对于那些对替代解决方案感兴趣的人,以下是使用针对JSON的步行路径unix工具实现相同要求的方法: jtc

bash $ jtc -qq -w'<>a' -T'"#text\t@x\t@y"' -w'<@x>l:<x>v[-1][@y]<y>v[-1][#text]' -T'"{}\t{x}\t{y}"' file.json
#text   @x      @y
Text1   121.10  83.42
Text2   121.10  124.82
Text3   85.10   69.62
Text4   85.10   83.42
Text5   85.10   97.22
bash $ 

步行路径(-w)细分:

  • <@x>l: <x>v找到每个标签@x,并在命名空间x中记住找到的JSON值
  • [-1][@y]<y>v(从上次找到的值开始)寻址父级,然后通过标签@y寻址JSON并将其值存储在命名空间y
  • [-1][#text]#text标签执行相同操作(注意:不记住最后一个值)

--T'"{}\t{x}\t{y}"':带插值的应用模板({}将插值最后找到的值,因此无需将其存储在名称空间中)

--qq取消报价所生成的JSON字符串(删除引号并将\t转换为选项卡)

-第一步(-w'<>a')只是一个虚拟的触发器,可以触发标题行的模板插值。

PS>披露:我是jtc-用于JSON操作的shell cli工具的创建者

答案 3 :(得分:1)

处理input2.json

由于与input2.json对应的第二组要求中需要一些上下文相关信息,因此不能忽略上下文,因此以下解决方案使用“向下钻取”方法。除非您了解foreach,否则以下内容将很难理解,因此,我只想提一下,该方法本质上使用状态变量{counter,page,row}来跟踪这三个计数器。

["counter", "page", "row", "#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"], 
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
  .page += 1
  | foreach ($page | .row[]?) as $row (.row=-1;
    .row += 1
    | foreach ($row | (.column | (if type == "array" then .[] else . end )) | .text | objects) as $x (.;
      .counter += 1
      | .out = [.counter, .page, .row, $x["#text", "@data", "@fontName", "@fontSize", "@format", "@height", "@type", "@width", "@x", "@y"]]
      ; . )
      ; . )
      ; .out )
)
| @tsv

这将产生所需的TSV,但第一行数据除外,因为该行没有行。在Relate elements in table form from Json file with jq

的答案中显示了包含第一行的一种方法

答案 4 :(得分:-1)

在此命令中:

$ cat file.json | jq .document.page[].row | ["#text", "@x", "@y"] | @csv

jq之后的所有内容都应作为jq的第一个参数,这意味着您需要将其用引号引起来。此外,cat file.json |在这里是Useless Use of Cat;只需将文件名作为参数传递给jq。因此,正确的命令是:

$ jq '.document.page[].row | ["#text", "@x", "@y"] | @csv' file.json