jq:将json对象值转换为数组

时间:2017-11-13 21:16:35

标签: arrays json grouping jq bucket

我有以下对象数组(这只是一个摘录,对象也更大):

[{
    "DATE": "10.10.2017 01:00",
    "ID": "X",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "X",
    "VALUE_ONE": 30,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "X",
    "VALUE_ONE": 25,
    "VALUE_TWO": 2
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Y",
    "VALUE_ONE": 10,
    "VALUE_TWO": 9
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Y",
    "VALUE_ONE": 20,
    "VALUE_TWO": 5
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Y",
    "VALUE_ONE": 50,
    "VALUE_TWO": 5
  },

  {
    "DATE": "10.10.2017 01:00",
    "ID": "Z",
    "VALUE_ONE": 55,
    "VALUE_TWO": 3
  },
  {
    "DATE": "10.10.2017 02:00",
    "ID": "Z",
    "VALUE_ONE": 60,
    "VALUE_TWO": 7
  },
  {
    "DATE": "10.10.2017 03:00",
    "ID": "Z",
    "VALUE_ONE": 15,
    "VALUE_TWO": 7
  }
]

为了简化Web应用程序的这一过程,并减少文件大小,我想将"VALUE_ONE""VALUE_TWO""DATE"值转换为每个“ID”的数组像这样:

[{
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "X",
    "VALUE_ONE": [20, 30, 25],
    "VALUE_TWO": [5, 7, 2]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Y",
    "VALUE_ONE": [10, 20, 50],
    "VALUE_TWO": [9, 5, 5]
  },
  {
    "DATE": ["10.10.2017 01:00", "10.10.2017 02:00", "10.10.2017 03:00"],
    "ID": "Z",
    "VALUE_ONE": [55, 60, 15],
    "VALUE_TWO": [3, 7, 7]
  }
]

在此重要的是,您需要能够找到与特定时间(日期)相关联的值。由于"DATE"的输入值是连续的,因此您很可能不再需要DATE值来查找请求的"VALUE.."值。你可以只使用数组的索引(index=0总是10.10.2017 01:00index=1是...... 02:00等等。 有可能这样做吗?这将使文件大小更小。 谢谢!

4 个答案:

答案 0 :(得分:1)

2步减少(它看起来不漂亮但有效):

jq 'reduce group_by(.ID)[] as $a ([]; . + [ reduce $a[] as $o 
   ({"DATE":[],"VALUE_ONE":[],"VALUE_TWO":[]}; 
    .DATE |= .+ [$o.DATE] | .ID = $o.ID |.VALUE_ONE |= .+ [$o.VALUE_ONE] 
    | .VALUE_TWO |= .+ [$o.VALUE_TWO]) ] )' input.json

输出:

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ],
    "ID": "X"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ],
    "ID": "Y"
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ],
    "ID": "Z"
  }
]

答案 1 :(得分:0)

以下解决方案避免了group_by,原因有两个:

  • 效率
  • jq版本1.5中sort使用的group_by可能不稳定,这会让事情变得复杂。

相反,我们使用bucketize定义如下:

def bucketize(f): reduce .[] as $x ({}; .[$x|f] += [$x] );

为了简单起见,我们还将定义以下辅助函数:

# compactify an array with a single ID
def compact:
  . as $in
  | reduce (.[0]|keys_unsorted[]) as $key ({};
      . + {($key): $in|map(.[$key])})
    + {"ID": .[0].ID}
    ;

解决方案

[bucketize(.ID)[] | compact]

即使日期集的ID不同,即使JSON对象最初没有按日期分组,这也可以确保一切正常。

(如果您想在最终结果中完全删除“DATE”,请在上面的行中将compact的号码替换为compact | del(.DATE)。)

输出

[
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "X",
    "VALUE_ONE": [
      20,
      30,
      25
    ],
    "VALUE_TWO": [
      5,
      7,
      2
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Y",
    "VALUE_ONE": [
      10,
      20,
      50
    ],
    "VALUE_TWO": [
      9,
      5,
      5
    ]
  },
  {
    "DATE": [
      "10.10.2017 01:00",
      "10.10.2017 02:00",
      "10.10.2017 03:00"
    ],
    "ID": "Z",
    "VALUE_ONE": [
      55,
      60,
      15
    ],
    "VALUE_TWO": [
      3,
      7,
      7
    ]
  }
]

答案 2 :(得分:0)

以下是使用reducesetpathgetpathdelsymbolic variable destructuring的解决方案。它将收集并行数组中除IDDATE以外的其他键的所有值(无需对VALUE_ONE进行硬编码等)。

reduce (.[] | [.ID, .DATE, del(.ID,.DATE)]) as [$id,$date,$v] ({};
    (getpath([$id, "DATE"])|length) as $idx
  | setpath([$id, "ID"]; $id)
  | setpath([$id, "DATE", $idx]; $date)
  | reduce ($v|keys[]) as $k (.; setpath([$id, $k, $idx]; $v[$k]))
)
| map(.)

Try it online!

答案 3 :(得分:0)

如果您的数据集足够小,您可以按ID分组它们并映射到所需的结果。与流式传输解决方案相比,它不会超级高效,但使用内置函数最简单。

group_by(.ID) | map({
    DATE: map(.DATE),
    ID: .[0].ID,
    VALUE_ONE: map(.VALUE_ONE),
    VALUE_TWO: map(.VALUE_TWO)
})