我有一份CSV记录:
name,credits,email
bob,,test1@foo.com
bob,6.0,test@foo.com
bill,3.0,something_else@a.com
bill,4.0,something@a.com
tammy,5.0,hello@gmail.org
其中name
是索引。因为有多个具有相同名称的记录,所以我想将整行(减去名称)滚动到列表中以创建表单的JSON:
{
"bob": [
{ "credits": null, "email": "test1@foo.com"},
{ "credits": 6.0, "email": "test@foo.com" }
],
// ...
}
我目前的解决方案有点笨拙,因为它似乎只使用pandas作为读取CSV的工具,但它仍会产生我预期的JSONish输出:
#!/usr/bin/env python3
import io
import pandas as pd
from pprint import pprint
from collections import defaultdict
def read_data():
s = """name,credits,email
bob,,test1@foo.com
bob,6.0,test@foo.com
bill,3.0,something_else@a.com
bill,4.0,something@a.com
tammy,5.0,hello@gmail.org
"""
data = io.StringIO(s)
return pd.read_csv(data)
if __name__ == "__main__":
df = read_data()
columns = df.columns
index_name = "name"
print(df.head())
records = defaultdict(list)
name_index = list(columns.values).index(index_name)
columns_without_index = [column for i, column in enumerate(columns) if i != name_index]
for record in df.values:
name = record[name_index]
record_without_index = [field for i, field in enumerate(record) if i != name_index]
remaining_record = {k: v for k, v in zip(columns_without_index, record_without_index)}
records[name].append(remaining_record)
pprint(dict(records))
有没有办法在本地熊猫(和numpy)中做同样的事情?
答案 0 :(得分:4)
这就是你想要的吗?
cols = df.columns.drop('name').tolist()
或@jezrael建议:
cols = df.columns.difference(['name'])
然后:
s = df.groupby('name')[cols].apply(lambda x: x.to_dict('r')).to_json()
让我们打印得很好:
In [45]: print(json.dumps(json.loads(s), indent=2))
{
"bill": [
{
"credits": 3.0,
"email": "something_else@a.com"
},
{
"credits": 4.0,
"email": "something@a.com"
}
],
"bob": [
{
"credits": null,
"email": "test1@foo.com"
},
{
"credits": 6.0,
"email": "test@foo.com"
}
],
"tammy": [
{
"credits": 5.0,
"email": "hello@gmail.org"
}
]
}