使用Pandas处理订单json数据

时间:2019-01-09 11:04:29

标签: python pandas dataframe

我有订单的JSON数据,如下所示:

[
    {
        "id": 640197558336,
        "line_items": [
            {
                "id": 1501742661696,
                "variant_id": 19490901426240,
                "title": "\"Acrylic Bag\"",
                "quantity": 1
            },
            {
                "id": 1501742661695,
                "variant_id": 19490901426245,
                "title": "\"Trash Can\"",
                "quantity": 2
            }
        ]
    },
    {
        "id": 640197558337,
        "line_items": [
            {
                "id": 1501742661699,
                "variant_id": 19490901426249,
                "title": "\"Sports headphones\"",
                "quantity": 5
            },
            {
                "id": 1501742661695,
                "variant_id": 19490901426245,
                "title": "\"Trash Can\"",
                "quantity": 6
            }
        ]
    }
]

我已经使用pandas来读取json并将其作为数据帧进行查看,请参见下文:

import pandas as pd

desired_width = 920
file_name = "trimmedorders"
file_ext = ".json"
pd.set_option('display.width', desired_width)

df = pd.read_json(file_name+file_ext,orient='columns')
df.head()

数据框如下:

enter image description here

有人可以帮助我将其转换为以下格式吗?

enter image description here

在这方面的任何帮助/建议,我们深表感谢。谢谢。

2 个答案:

答案 0 :(得分:1)

一种方法是预处理JSON。

例如:

import pandas as pd

data = [{"id": i["id"], "line_items_id": j["id"], "line_items_variant_id": j["variant_id"], 
         "line_items_title": j["title"], "line_items_quantity": j["quantity"]}  for i in data for j in i["line_items"]]

df = pd.DataFrame(data)
print(df)

输出:

             id  line_items_id  line_items_quantity     line_items_title  \
0  640197558336  1501742661696                    1        "Acrylic Bag"   
1  640197558336  1501742661695                    2          "Trash Can"   
2  640197558337  1501742661699                    5  "Sports headphones"   
3  640197558337  1501742661695                    6          "Trash Can"   

   line_items_variant_id  
0         19490901426240  
1         19490901426245  
2         19490901426249  
3         19490901426245  

答案 1 :(得分:1)

使用pd.io.json.json_normalizemeta + record_path关键字:

from pandas.io.json import json_normalize

df = pd.concat([
    json_normalize(row, record_path=["line_items"], record_prefix="line_item_", meta="id")
        for row in data
])

>> print(df)

    line_item_id  line_item_quantity      line_item_title  \
0  1501742661696                   1        "Acrylic Bag"   
1  1501742661695                   2          "Trash Can"   
0  1501742661699                   5  "Sports headphones"   
1  1501742661695                   6          "Trash Can"   

   line_item_variant_id            id  
0        19490901426240  640197558336  
1        19490901426245  640197558336  
0        19490901426249  640197558337  
1        19490901426245  640197558337  

请注意,这比@Rakesh的解决方案昂贵,因为json_normalize每次都会返回DataFrame