带有需要分组的列数据的csv到json

时间:2017-04-01 10:57:51

标签: python pandas

我有一个类似于此

的CSV文件
// extensions: [ '.web.js', '.js', '.jsx' ]
extensions: [ '.web.js', '.js', '.jsx', '.png' ]

我想导出到json,目前我正在这样做。

order_id, customer_name, item_1_id, item_1_quantity, Item_2_id, Item_2_quantity, Item_3_id, Item_3_quantity
1,        John,          4,         1,               24,        4,               16,        1
2,        Paul,          8,         3,               41,        1,               33,        1
3,        Andrew,        1,         1,               34,        4,               8,          2

输出

df = pd.read_csv('simple.csv')
print ( df.to_json(orient = 'records') )

但是,我希望输出为

[
    {
        "Item_2_id": 24,
        "Item_2_quantity": 4,
        "Item_3_id": 16,
        "Item_3_quantity": 1,
        "customer_name": "John",
        "item_1_id": 4,
        "item_1_quantity": 1,
        "order_id": 1
    },
......

有关良好方法的任何建议吗?

在此特定项目中,每个订单的次数不会超过5次

3 个答案:

答案 0 :(得分:0)

来源DF:

In [168]: df
Out[168]:
   order_id customer_name  item_1_id  item_1_quantity  Item_2_id  Item_2_quantity  Item_3_id  Item_3_quantity
0         1          John          4                1         24                4         16                1
1         2          Paul          8                3         41                1         33                1
2         3        Andrew          1                1         34                4          8                2

<强>解决方案:

In [169]: %paste
import re

x = df[['order_id','customer_name']].copy()
x['id'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_id',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x['quantity'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_quantity',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x.to_json(orient='records')
## -- End pasted text --
Out[169]: '[{"order_id":1,"customer_name":"John","id":[4,24,16],"quantity":[1,4,1]},{"order_id":2,"customer_name":"Paul","id":[8,41,33],"qua
ntity":[3,1,1]},{"order_id":3,"customer_name":"Andrew","id":[1,34,8],"quantity":[1,4,2]}]'

中级助手DF:

In [82]: x
Out[82]:
   order_id customer_name           id   quantity
0         1          John  [4, 24, 16]  [1, 4, 1]
1         2          Paul  [8, 41, 33]  [3, 1, 1]
2         3        Andrew   [1, 34, 8]  [1, 4, 2]

答案 1 :(得分:0)

尝试以下方法:

import pandas as pd
import json

output_lst = []

##specify the first row as header
df = pd.read_csv('simple.csv', header=0)
##iterate through all the rows
for index, row in df.iterrows():
    dict = {}
    items_lst = []
    ## column_list is a list of column headers
    column_list = df.columns.values
    for i, col_name in enumerate(column_list):
        ## for the first 2 columns simply copy the value into the dictionary
        if i<2:
            element = row[col_name]
            if isinstance(element, str):
            ## strip if it is a string type value
                element = element.strip()
            dict[col_name] = element

        elif "_id" in col_name:
            ## i+1 is used assuming that the item_quantity comes right after  the corresponding item_id for each item
            item_dict  = {"id":row[col_name], "quantity":row[column_list[i+1]]}
            items_lst.append(item_dict)

    dict["items"] = items_lst
    output_lst.append(dict)

print json.dumps(output_lst)

如果使用问题中描述的sample.csv运行上述文件,则会得到以下输出:

[
    {
        "order_id": 1,
        "items": [
            {
                "id": 4,
                "quantity": 1
            },
            {
                "id": 24,
                "quantity": 4
            },
            {
                "id": 16,
                "quantity": 1
            }
        ],
        " customer_name": "John"
    },
    {
        "order_id": 2,
        "items": [
            {
                "id": 8,
                "quantity": 3
            },
            {
                "id": 41,
                "quantity": 1
            },
            {
                "id": 33,
                "quantity": 1
            }
        ],
        " customer_name": "Paul"
    },
    {
        "order_id": 3,
        "items": [
            {
                "id": 1,
                "quantity": 1
            },
            {
                "id": 34,
                "quantity": 4
            },
            {
                "id": 8,
                "quantity": 2
            }
        ],
        " customer_name": "Andrew"
    }
]

答案 2 :(得分:0)

j = df.set_index(['order_id','customer_name']) \
      .groupby(lambda x: x.split('_')[-1], axis=1) \
      .agg(lambda x: x.values.tolist()) \
      .reset_index() \
      .to_json(orient='records')

import json

Beatufied结果:

In [122]: print(json.dumps(json.loads(j), indent=2))
[
  {
    "order_id": 1,
    "customer_name": "John",
    "id": [
      4,
      24,
      16
    ],
    "quantity": [
      1,
      4,
      1
    ]
  },
  {
    "order_id": 2,
    "customer_name": "Paul",
    "id": [
      8,
      41,
      33
    ],
    "quantity": [
      3,
      1,
      1
    ]
  },
  {
    "order_id": 3,
    "customer_name": "Andrew",
    "id": [
      1,
      34,
      8
    ],
    "quantity": [
      1,
      4,
      2
    ]
  }
]