Question

我有一份CSV记录：

name,credits,email
bob,,test1@foo.com
bob,6.0,test@foo.com
bill,3.0,something_else@a.com
bill,4.0,something@a.com
tammy,5.0,hello@gmail.org

其中name是索引。因为有多个具有相同名称的记录，所以我想将整行（减去名称）滚动到列表中以创建表单的JSON：

{
  "bob": [
      { "credits": null, "email": "test1@foo.com"},
      { "credits": 6.0, "email": "test@foo.com" }
  ], 
  // ...
}

我目前的解决方案有点笨拙，因为它似乎只使用pandas作为读取CSV的工具，但它仍会产生我预期的JSONish输出：

#!/usr/bin/env python3

import io
import pandas as pd
from pprint import pprint
from collections import defaultdict

def read_data():
    s = """name,credits,email
bob,,test1@foo.com
bob,6.0,test@foo.com
bill,3.0,something_else@a.com
bill,4.0,something@a.com
tammy,5.0,hello@gmail.org
"""

    data = io.StringIO(s)
    return pd.read_csv(data)

if __name__ == "__main__":
    df = read_data()
    columns = df.columns
    index_name = "name"
    print(df.head())

    records = defaultdict(list)

    name_index = list(columns.values).index(index_name)
    columns_without_index = [column for i, column in enumerate(columns) if i != name_index]

    for record in df.values:
        name = record[name_index]
        record_without_index = [field for i, field in enumerate(record) if i != name_index]
        remaining_record = {k: v for k, v in zip(columns_without_index, record_without_index)}
        records[name].append(remaining_record)
    pprint(dict(records))

有没有办法在本地熊猫（和numpy）中做同样的事情？

Answer 1

这就是你想要的吗？

cols = df.columns.drop('name').tolist()

或@jezrael建议：

cols = df.columns.difference(['name'])

然后：

s = df.groupby('name')[cols].apply(lambda x: x.to_dict('r')).to_json()

让我们打印得很好：

In [45]: print(json.dumps(json.loads(s), indent=2))
{
  "bill": [
    {
      "credits": 3.0,
      "email": "something_else@a.com"
    },
    {
      "credits": 4.0,
      "email": "something@a.com"
    }
  ],
  "bob": [
    {
      "credits": null,
      "email": "test1@foo.com"
    },
    {
      "credits": 6.0,
      "email": "test@foo.com"
    }
  ],
  "tammy": [
    {
      "credits": 5.0,
      "email": "hello@gmail.org"
    }
  ]
}

如何在Pandas中创建按索引分组的记录列表？

1 个答案: