苦苦挣扎用jq解析JSON

时间:2017-05-17 16:07:24

标签: json bash parsing jq

我已经阅读了与之相关的所有帖子,我已经玩了好几个小时了,仍然无法掌握这个工具,这似乎正是我的工作如果我只是找到一种方法让它按照我的需要工作...... 所以这是我的JSON示例:

{
    "res": "0",
    "main": {
        "All": [
      {
        "field1": "a",
        "field2": "aa",
        "field3": "aaa",
        "field4": "0",
        "active": "true",
        "id": "1"
      },
      {
        "field1": "b",
        "field2": "bb",
        "field3": "bbb",
        "field4": "0",
        "active": "false",
        "id": "2"
      },
      {
        "field1": "c",
        "field2": "cc",
        "field3": "ccc",
        "field4": "0",
        "active": "true",
        "id": "3"
      },
      {
        "field1": "d",
        "field2": "dd",
        "field3": "ddd",
        "field4": "0",
        "active": "true",
        "id": "4"
      }
        ]
    }

}

我想有选择地提取一些字段并获得这样的csv输出:

field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4

请注意我已跳过某些字段,而且我对父数组等也不感兴趣。 非常感谢。

2 个答案:

答案 0 :(得分:3)

首先,您的JSON需要修复如下:

{
  "main": {

  },
  "table": {
    "All": [
      {
        "field1": "a",
        "field2": "aa",
        "field3": "aaa",
        "field4": "0",
        "active": "true",
        "id": "1"
      },
      {
        "field1": "b",
        "field2": "bb",
        "field3": "bbb",
        "field4": "0",
        "active": "false",
        "id": "2"
      },
      {
        "field1": "c",
        "field2": "cc",
        "field3": "ccc",
        "field4": "0",
        "active": "true",
        "id": "3"
      },
      {
        "field1": "d",
        "field2": "dd",
        "field3": "ddd",
        "field4": "0",
        "active": "true",
        "id": "4"
      }

    ]
  },
  "res": "0"
}

第二次使用 jq ,您可以执行以下操作,以便使用生成表格输出:

{ echo Field1 Field2 Field3 ID ; cat data.json  | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 } | column -t

输出:

Field1  Field2  Field3  ID
a       aa      aaa     1
b       bb      bbb     2
c       cc      ccc     3
d       dd      ddd     4

使用 sed

echo "field1,field2,field3,id" ;cat data.json  | jq -r '.table.All[] | (.field1, .field2, .field3, .id)' | xargs -L4 | sed 's/ /,/g'

输出:

field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4

更新

不使用 sed xargs jq 可以将输出格式化为csv,如下所示:

cat data.json  | jq -r '.table.All[] | [.field1, .field2, .field3, .id] | @csv'

输出:

"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"

感谢他在评论中提到的chepner,可以直接使用 jq 添加标题,如下所示:

jq -r '(([["field1", "field2", "field3", "id"]]) + [(.table.All[] | [.field1,.field2,.field3,.id])])[]|@csv' data.json 

输出:

"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"

此命令应根据您在问题中提供的最新JSON数据正常工作:

jq -r '(([["field1", "field2", "field3", "id"]]) + [(.main.All[] | [.field1,.field2,.field3,.id])])[]|@csv' data.json
  

([[“field1”,“field2”,“field3”,“id”]]):该命令的第一部分是针对csv标头

     

(。main.All [] | [.field1,.field2,。field3,.id])]):由于main是   您的JSON的父级,然后您可以使用.main选择它   打印数组All然后打印此数组的内容   必须将[]添加到此数组的名称,完整命令将   是.main.All[],它将打印多个词典,我们可以   通过管道.main.All[]的输出来确定所需的密钥   另一个包含我们想要的键的数组   [.field1,.field2,.field3,.id]

答案 1 :(得分:2)

这是一个只有jq的解决方案,只需要指定一次所需的密钥,例如在命令行上:

jq -r --argjson f '["field1", "field2", "field3", "id"]' '
  $f, (.table.All[] | [getpath( $f[]|[.])]) | @csv'

输出:

"field1","field2","field3","id"
"a","aa","aaa","1"
"b","bb","bbb","2"
"c","cc","ccc","3"
"d","dd","ddd","4"

丢失引号

避免引用字符串的一种方法是插入join(",")(或join(", "))而不是@csv

field1,field2,field3,id
a,aa,aaa,1
b,bb,bbb,2
c,cc,ccc,3
d,dd,ddd,4

当然,如果值包含逗号,这可能是不可接受的。通常,如果避免字符串周围的引号很重要,那么考虑的一个好选择是@tsv