我目前有这样的CSV文件:
A B C
1 10 {"a":"one","b":"two","c":"three"}
1 10 {"a":"four","b":"five","c":"six"}
1 10 {"a":"seven","b":"eight","c":"nine"}
1 10 {"a":"ten","b":"eleven","c":"twelve"}
2 10 {"a":"thirteen","b":"fourteen","c":"fifteen"}
2 10 {"a":"sixteen","b":"seventeen","c":"eighteen"}
2 10 {"a":"nineteen","b":"twenty","c":"twenty-one"}
3 10 {"a":"twenty-two","b":"twenty-three","c":"twenty-four"}
3 10 {"a":"twenty-five","b":"twenty-six","c":"twenty-seven"}
3 10 {"a":"twenty-eight","b":"twenty-nine","c":"thirty"}
3 10 {"a":"thirty-one","b":"thirty-two","c":"thirty-three"}
我想按A列分组,忽略B列,而只取C中的“ b”字段,并得到如下输出:
A C
1 ['two','five','eight','eleven']
2 ['fourteen','seventeen','twenty']
3 ['twenty-three','twenty-six','twenty-nine','thirty-two']
我可以这样做吗?我有熊猫,如果那会有用的!另外,我希望输出文件以制表符分隔。
答案 0 :(得分:0)
尝试一下:
import pandas as pd
import json
# read file that looks exactly as given above
df = pd.read_csv("file.csv", delim_whitespace=True)
# drop the 'B' column
del df['B']
# 'C' will start life as a string. convert from json, extract values, return as list
df['C'] = df['C'].map(lambda x: json.loads(x)['b'])
# 'C' now holds just the 'b' values. group these together:
df = df.groupby('A').C.apply(lambda x : list(x))
print(df)
这将返回:
A
1 [two, five, eight, eleven]
2 [fourteen, seventeen, twenty]
3 [twenty-three, twenty-six, twenty-nine, thirty...
答案 1 :(得分:0)
IIUC
df.groupby('A').C.apply(lambda x : [y['b'] for y in x ])
A
1 [two, five, eight, eleven]
2 [fourteen, seventeen, twenty]
3 [twenty-three, twenty-six, twenty-nine, thirty...
Name: C, dtype: object