我有一个csv文件,它在第一列中有重复值。我想在列表中收集第一列的第二列的所有值
column1 column2
a 54.2
s 78.5
k 89.62
a 77.2
a 65.56
我希望得到像
print a # [54.2,77.2,65.56]
print s # [78.5]
print k # [89.62]
答案 0 :(得分:4)
使用python的CSV reader似乎相当简单。
<强> data.csv 强>
a,54.2
s,78.5
k,89.62
a,77.2
a,65.56
<强> script.py 强>
import csv
result = {}
with open('data.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csvreader:
if row[0] in result:
result[row[0]].append(row[1])
else:
result[row[0]] = [row[1]]
print result
<强>输出强>
{
'a': ['54.2', '77.2', '65.56'],
's': ['78.5'],
'k': ['89.62']
}
正如@Pete所说,你可以使用defaultdict美化它:
<强> script.py 强>
import csv
from collections import defaultdict
result = defaultdict(list) # each entry of the dict is, by default, an empty list
with open('data.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csvreader:
result[row[0]].append(row[1])
print result
答案 1 :(得分:1)
一种方法是使用熊猫,填充数据框,使用groupby,然后将列表应用于所有组:
import pandas as pd
df = pd.DataFrame({'column1':['a','s','k','a','a'],'column2':
[54.2,78.5,89.62,77.2,65.56]})
print(df.groupby('column1')['column2'].apply(list))
输出:
column1
a [54.2, 77.2, 65.56]
k [89.62]
s [78.5]
Name: column2, dtype: object