从JSON创建pandas df,其中列标题和行位于单独的数组中

时间:2014-09-30 12:11:39

标签: python pandas

API正在以下列形式向我发送数据:

{
    uselessInfo: blabla,
    headers: [
        {type:DIMENSION, name:DATE},
        {type:DIMENSION, name:COUNTRY},
        {type:METRIC, name:REVENUE}
    ],
    rows: [
        ["2014-09-29","Germany",435],
        ["2014-09-28","USA",657],
        ...
        ["2014-09-13","Spain",321]
    ],
    average: [ some unwanted info ],
    total: [ some unwanted info ]
}

我想用这个对象在pandas中创建一个数据帧,只使用:

  • 标题信息以命名我的列
  • 数据行
  • 忽略其余的。

到目前为止,我已尝试更改熊猫中的参数' " .read_json"但没有任何好结果。我找不到任何类似的例子。

1 个答案:

答案 0 :(得分:2)

pandas.read_json无法将所有 JSON转换为DataFrame。 JSON必须在orient参数下具有described in the docs格式之一。

相反,使用json.loads将数据转换为Python对象,然后选择标题和行以形成DataFrame:

import json
import pandas as pd

content = '''{
    "uselessInfo": "blabla", 
    "headers": [
        { "type": "DIMENSION", "name": "DATE" }, 
        { "type": "DIMENSION", "name": "COUNTRY" }, 
        { "type": "METRIC", "name": "REVENUE" }
    ],
    "rows": [ [ "2014-09-29", "Germany", 435 ], 
        [ "2014-09-28", "USA", 657 ], 
        [ "2014-09-13", "Spain", 321 ]
    ], 
    "average": [ "some unwanted info" ], 
    "total": [ "some unwanted info" ]
}'''
data = json.loads(content)


columns = [dct['name'] for dct in data['headers']]
df = pd.DataFrame(data['rows'], columns=columns)
print(df)

产量

         DATE  COUNTRY  REVENUE
0  2014-09-29  Germany      435
1  2014-09-28      USA      657
2  2014-09-13    Spain      321