将数组和嵌套数组中的JSON值写入单个CSV

时间:2017-04-23 19:46:41

标签: python json csv pandas ijson

我有一个JSON输出,我想创建一个包含两列的csv文件。第一列应包含userId,第二列应包含videoSeries的值。输出如下:

ValueError: could not broadcast input array from shape (2) into shape (3)

我的csv应该是这样的:

{
  "start": 1490383076,
  "stop": 1492975076,
  "events": [
    {
      "time": 1491294219,
      "customParameters": [
        {
          "group": "channelId",
          "item": "dr3"
        },
        {
          "group": "videoGenre",
          "item": "unknown"
        },
        {
          "group": "videoSeries",
          "item": "min-mor-er-pink"
        },
        {
          "group": "videoSlug",
          "item": "min-mor-er-pink"
        }
      ],
      "userId": "cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16"
    }
  ],
}

我尝试使用ijson和pandas来获得所需的输出,但我无法将两个不同数组的值转换为单个csv

--------------------------------------------------------------
User ID                                       videoSeries
--------------------------------------------------------------
cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16    min-mor-er-pink
--------------------------------------------------------------

1 个答案:

答案 0 :(得分:1)

尝试这种方法:

d是一个根据您的JSON构建的字典:

In [150]: d
Out[150]:
{'events': [{'customParameters': [{'group': 'channelId', 'item': 'dr3'},
    {'group': 'videoGenre', 'item': 'unknown'},
    {'group': 'videoSeries', 'item': 'min-mor-er-pink'},
    {'group': 'videoSlug', 'item': 'min-mor-er-pink'}],
   'time': 1491294219,
   'userId': 'cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16'}],
 'start': 1490383076,
 'stop': 1492975076}

解决方案:

In [153]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
     ...:   .query("group in ['videoSeries']")[['userId','item']]
     ...:
Out[153]:
                                       userId             item
2  cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16  min-mor-er-pink

如果您需要videoSeries作为列名:

In [154]: pd.io.json.json_normalize(d['events'], 'customParameters', ['userId']) \
     ...:   .query("group in ['videoSeries']")[['userId','item']] \
     ...:   .rename(columns={'item':'videoSeries'})
     ...:
Out[154]:
                                       userId      videoSeries
2  cx:hr1y0kcbhhr61qj7kspglu767:344xy3wb5bz16  min-mor-er-pink