jq:使用输入文件有条件地更新/替换/添加json元素

时间:2017-10-05 21:46:16

标签: json jq

我收到以下输入文件:

  
      
  • input.json:
  •   
[
 {"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
 {"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
 {"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},

 {"ID":"bbb_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
 {"ID":"bbb_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
 {"ID":"bbb_0021122","time_CET":"00:30:00","VALUE":22,"FLAG":"0"},

 {"ID":"ccc_0021122","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
 {"ID":"ccc_0021122","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
 {"ID":"ccc_0021122","time_CET":"00:30:00","VALUE":20,"FLAG":"0"},

 {"ID":"ddd_122455","time_CET":"00:00:00","VALUE":null,"FLAG":"?"},
 {"ID":"ddd_122455","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
 {"ID":"ddd_122455","time_CET":"00:30:00","VALUE":null,"FLAG":"?"},
]

如您所见,有一些有效值(FLAG:0)和一些无效值(FLAG:“?”)。 现在我有一个看起来像这样的文件(每个ID一个):

  

aaa.json:

[
  {"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
  {"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":null,"FLAG":"?"},
  {"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]

如您所见,对象1与input.json中的对象相同,但对象2无效(FLAG:“?”)。这就是为什么对象2必须被input.json中的正确对象替换(使用VALUE:18)。 可以通过“time_CET”和“ID”元素来标识对象。

此外,input.json中将有新对象,它们不属于aaa.json等。这些对象应该添加到数组中,并且应保留aaa.json中的有效对象。

  

最后,aaa.json应该是这样的:

[
  {"ID":"aaa_12301248","time_CET":"00:00:00","VALUE":10,"FLAG":"0"},
  {"ID":"aaa_12301248","time_CET":"00:15:00","VALUE":18,"FLAG":"0"},
  {"ID":"aaa_12301248","time_CET":"00:30:00","VALUE":160,"FLAG":"0"},
  {"ID":"aaa_12301248","time_CET":"00:55:00","VALUE":45,"FLAG":"0"}
]

所以,总结一下:

  1. 寻找FLAG:“?”在aaa.json
  2. 使用“ID”将input.json中的匹配对象替换为此对象 和“time_CET”用于映射。
  3. 保持现有的有效对象并从input.json中添加对象 之前在aaa.json中不存在(这意味着只有对象开始 “ID”字段中的“aaa”)
  4. 对bbb.json,ccc.json和ddd.json
  5. 重复此操作

    我不确定是否可以使用这样的命令一次完成此操作,因为输出必须返回到正确的id文件(aaa,bbb ccc.json):

    jq --argfile aaa aaa.json --argfile bbb bbb.json .... -f prog.jq input.json
    

    问题是,标识符(aaa,bbb,ccc等)之后的数字可能会改变。因此,为了确保将对象添加到正确的文件/数组中,需要这样的语句:
     if (."ID"|contains("aaa")) then ....

    或者使用不同的输入参数多次运行程序更好?我不确定..

    提前谢谢!!

1 个答案:

答案 0 :(得分:1)

这是一种方法

#!/bin/bash

# usage: update.sh input.json aaa.json bbb.json....
# updates each of aaa.json bbb.json.... 

input_json="$1"
shift

for i in "$@"; do
    jq -M --argfile input_json "$input_json" '

      # functions to restrict input.json to keys of current xxx.json file
      def prefix:              input_filename | split(".")[0];
      def selectprefix:        select(.ID | startswith(prefix));

      # functions to build and probe a lookup table
      def pk:                  [.ID, .time_CET];
      def lookup($t;$k):       $t | getpath($k);
      def lookup($t):          lookup($t;pk);
      def organize(s):         reduce s as $r ({}; setpath($r|pk; $r));

      # functions to identify objects in input.json missing from xxx.json
      def pks:                 paths | select(length==2);
      def missing($t1;$t2):    [$t1|pks] - [$t2|pks] | .[];
      def getmissing($t1;$t2): [ missing($t1;$t2) as $p | lookup($t1;$p)];

      # main routine
        organize(.[]) as $xxx
      | organize($input_json[] | selectprefix) as $inp
      | map(if .FLAG != "?" then . else . += lookup($inp) end)
      | . + getmissing($inp;$xxx)

    ' "$i" | sponge "$i"

done

该脚本在循环中使用jq来读取和更新每个aaa.json ...文件。

过滤器会创建临时对象,以便按[ID,time_CET]查找值,使用FLAG更新aaa.json中的任何值==“?”最后添加input.json中遗漏的aaa.json中的任何值。

input.json的临时查找表使用input_filename,以便只包含以与当前处理文件的名称匹配的前缀开头的键。

示例运行:

$ ./update.sh input.json aaa.json
运行后

aaa.json

[
  {
    "ID": "aaa_12301248",
    "time_CET": "00:00:00",
    "VALUE": 10,
    "FLAG": "0"
  },
  {
    "ID": "aaa_12301248",
    "time_CET": "00:15:00",
    "VALUE": 18,
    "FLAG": "0"
  },
  {
    "ID": "aaa_12301248",
    "time_CET": "00:55:00",
    "VALUE": 45,
    "FLAG": "0"
  },
  {
    "ID": "aaa_12301248",
    "time_CET": "00:30:00",
    "VALUE": 160,
    "FLAG": "0"
  }
]