从异常的JSON中提取数据

时间:2016-02-25 09:47:43

标签: json csv hierarchical-data jq

有没有办法用以下JSON代码制作漂亮的CSV?

{
    "cod:e1!!@23" : {
        "typeA" : {
            "lsk:d##fjd": {
                "title" : "slkdfjlkdjfd",
                "year" : "2014"
            },
        "sdfdsfsd" : {
            "title" : "slkdfjlkdjfddewfsdfd",
            "year" : "2015"
            }
        },
        "Ct@ype" : {
            "sd$!!fs:$dfds" : {
                "title" : "slkdfjsdfsdfdsfsd",
                "year" : "2012"
            }
        }
    }
}

这是我在jq中尝试的内容:

jq -rc 'keys[] as $x 
  | .[]|keys[] as $y
  | .[]|keys[] as $z
  |.[]
  |[$x,$y,$z,.year] | @csv'

jq -rc 'keys_unsorted[] as $x
  | .[]|keys_unsorted[] as $y
  | .[]|keys_unsorted[] as $z
  | .[]|[$x,$y,$z,.year] | @csv'

但是输出不正确,因为如果有几个这样的记录,那么键就会排序和置换。我也尝试过keys_unsorted,但它没有解决问题。

此时修复原始JSON生成不是一个选项,因此将不胜感激任何帮助。

理想情况下,我会得到:

"cod:e1!!@23","typeA","lsk:d##fjd","slkdfjlkdjfd","2014"
"cod:e1!!@23","typeA","sdfdsfsd","slkdfjlkdjfddewfsdfd","2015"
"cod:e1!!@23","Ct@ype","sd$!!fs:$dfds","slkdfjsdfsdfdsfsd","2012"

4 个答案:

答案 0 :(得分:1)

对您在初始帖子中提供的脚本进行的小修改使其有效。而不是使用。[],我通过从keys_unsorted保存为变量的特定键进行索引。为方便起见,我还在CSV中添加了标题:

jq -r '["x", "y", "z", "title", "year"],
  (keys_unsorted[] as $x
   | .[$x] | keys_unsorted[] as $y
   | .[$y] | keys_unsorted[] as $z
   | .[$z] | [$x, $y, $z, .title, .year]) | @csv'

这确实提供了您正在寻找的输出(带标题):

"x","y","z","title","year"
"cod:e1!!@23","typeA","lsk:d##fjd","slkdfjlkdjfd","2014"
"cod:e1!!@23","typeA","sdfdsfsd","slkdfjlkdjfddewfsdfd","2015"
"cod:e1!!@23","Ct@ype","sd$!!fs:$dfds","slkdfjsdfsdfdsfsd","2012"

答案 1 :(得分:1)

以下为常规结构提供了一般解决方案 嵌套对象(松散地说,它们可以被认为是“babushka对象”,就像嵌套的玩偶一样);此外,对象内的键可以任何方式排序。

关键概念是“标量对象” - 所有对象的对象 键有标量值。

从“标量”​​中提取信息的模板 “对象”作为'emit'过滤器的参数提供并被使用 确保在生产时保持适当的订单 CSV行。

def emit(template):

  def is_scalar_object:
    def is_scalar: type | ((. != "object") and (. != "array"));
    . as $in | (type == "object") and all($in[] | is_scalar);

  . as $in
  | paths as $path
  | select(getpath($path) | is_scalar_object)
  | $path + [ template + ($in | getpath($path)) | .[]]
  ;


data | emit( {title,  year} ) | @csv

用法:

 jq -r emit.jq input.json

输出:

"cod:e1!!@23","typeA","lsk:d##fjd","slkdfjlkdjfd","2014"
"cod:e1!!@23","typeA","sdfdsfsd","slkdfjlkdjfddewfsdfd","2015"
"cod:e1!!@23","Ct@ype","sd$!!fs:$dfds","slkdfjsdfsdfdsfsd","2012"

答案 2 :(得分:0)

您可以在flatten选项中使用“https://github.com/zemirco/json2csv”。这将生成cod:e1!!@23.typeA.lsk:d##fjd.title之类的列。

cat input.json | json2csv -F >> output.csv

编辑:这不是您想要的。

答案 3 :(得分:0)

这是一个遍历"叶元素的jq脚本"在输入中,从其经过的每个键中生成一个CSV列:

"cod:e1!!@23","typeA","lsk:d##fjd","title","slkdfjlkdjfd"
"cod:e1!!@23","typeA","lsk:d##fjd","year","2014"
"cod:e1!!@23","typeA","sdfdsfsd","title","slkdfjlkdjfddewfsdfd"
"cod:e1!!@23","typeA","sdfdsfsd","year","2015"
"cod:e1!!@23","Ct@ype","sd$!!fs:$dfds","title","slkdfjsdfsdfdsfsd"
"cod:e1!!@23","Ct@ype","sd$!!fs:$dfds","year","2012"

请注意,这并不是您正在寻找的内容:

[Authorize("Admin")]