使用相关键规范化json

时间:2019-12-09 19:23:21

标签: python json pandas

我有这样的JSON:

in_str='''{
  "prices": [
    [1, 10],
    [2, 20],
    [3, 30]    
  ],
  "total_volumes": [
    [1, 100],
    [2, 200],
    [3, 300]
  ]
}'''

我正在尝试生成3列(id,价格,数量)的熊猫数据框:

1 10 100
2 20 200
3 30 300

我尝试使用pandas.read_json(),但这给了我两列,我不知道下一步要去哪里。 json_normalize()仅使我返回两行。

import pandas as pd
import json
from pandas.io.json import json_normalize

in_str='''{
  "prices": [
    [1, 10],
    [2, 20],
    [3, 30]    
  ],
  "total_volumes": [
    [1, 10],
    [2, 20],
    [3, 30]
  ]
}'''

df = pd.read_json(in_str)
json_normalize(json.loads(in_str))

4 个答案:

答案 0 :(得分:1)

我们可以照常阅读,重新转换和合并:

df = pd.read_json(in_str)
pd.merge(*[pd.DataFrame(np.array(df[col].to_list()), 
                        columns=['id', col]) for col in df],
         on='id')

输出:

   id  prices  total_volumes
0   1      10            100
1   2      20            200
2   3      30            300

答案 1 :(得分:1)

pd.read_json没有适合此结构的orient参数。看起来更容易直接转换json和构造数据帧:

>>> pd.DataFrame({key: dict(value) for key, value in json.loads(in_str).items()})

   prices  total_volumes
1      10            100
2      20            200
3      30            300

答案 2 :(得分:1)

您可以将json预处理为适当的字典,然后用它来构造数据框

import ast

d = ast.literal_eval(in_str)
d1 = {k: dict(v) for k, v in d.items()}
df = pd.DataFrame(d1).rename_axis('id').reset_index()

Out[857]:
   id  prices  total_volumes
0   1      10            100
1   2      20            200
2   3      30            300

答案 3 :(得分:1)

如果您尚未使用pandas,则不必下载它。这是一个使用内置json解析器读取文件,并使用本机数据结构将数据处理为所需形状(也可能是更有用的形状)的解决方案。

import json

in_str='''{
  "prices": [
    [1, 10],
    [2, 20],
    [3, 30]    
  ],
  "total_volumes": [
    [1, 100],
    [2, 200],
    [3, 300]
  ]
}'''

in_json = json.loads(in_str)
# you can use json.load(filename) if you're using a file here.
print(in_json)
'''
>>> {'prices': [[1, 10], [2, 20], [3, 30]], 'total_volumes': [[1, 100], [2, 200], [3, 300]]}
'''
# Here we're going to merge the two data sets to make them iterable in one go.
inventory = dict()
for item_id, price in in_json["prices"]:
  inventory[item_id] = {"price": price}

for item_id, volume in in_json["total_volumes"]:
  if isinstance(inventory.get(item_id), dict):
    inventory[item_id]["volume"] = volume
  else:
    inventory[item_id] = {"volume": volume}
print(inventory)
'''
>>> {1: {'price': 10, 'volume': 100}, 2: {'price': 20, 'volume': 200}, 3: {'price': 30, 'volume': 300}}
'''

# Now that the data is all in one dict, we can just iterate through it to get the rows in the shape that you want.
inventory_table = list()
for item_id, info in inventory.items():
  row = [item_id, info.get("price"), info.get("volume")]
  print(row)
  '''
  >>> [1, 10, 100]
  >>> [2, 20, 200]
  >>> [3, 30, 300]
  '''
  inventory_table.append(row)

# the final form
print(inventory_table)
'''
>>> [[1, 10, 100], [2, 20, 200], [3, 30, 300]]
'''

现在我们以此为基线,我们可以制作人们在Python中流口水的一些单行代码:

import json

in_str='''{
  "prices": [
    [1, 10],
    [2, 20],
    [3, 30]    
  ],
  "total_volumes": [
    [1, 100],
    [2, 200],
    [3, 300]
  ]
}'''

in_json = json.loads(in_str)

inventory = {item_id: {"price": price} for item_id, price in in_json["prices"]}

for item_id, volume in in_json["total_volumes"]:
  if isinstance(inventory.get(item_id), dict):
    inventory[item_id]["volume"] = volume
  else:
    inventory[item_id] = {"volume": volume}

print(inventory)

inventory_table = [[item_id, info.get("price"), info.get("volume")] for item_id, info in inventory.items()]
print(inventory_table)
'''
>>> [[1, 10, 100], [2, 20, 200], [3, 30, 300]]
'''