我有两个熊猫DataFrame,分别称为df1
和df2
。我想对列表值来自df2
的{{1}}中的列表值求和。
例如:
df1:
df1
和df2:
df1 = pd.DataFrame([['a',11],['b',13],['c',45],['d',88]],columns=['name1','data1'])
df1
name1 data1
0 a 11
1 b 13
2 c 45
3 d 88
最后,我想要这个:
df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2'])
df2
name2 data2
0 a [b, c, d]
1 b [a, c]
如何?非常感谢。
答案 0 :(得分:3)
首先通过df1
创建字典,然后使用get
列出对dict
的映射值的理解,如果将不匹配的值添加0
到sum
:
d = df1.set_index('name1')['data1'].to_dict()
df2['data2'] = [sum(d.get(y, 0) for y in x) for x in df2['data2']]
print (df2)
name2 data2
0 a 146
1 b 56
如果可能要删除NaN
,请使用filter
with condition:
df1 = pd.DataFrame([['a',11],['b',13],['c',45],['d',np.nan]],columns=['name1','data1'])
print (df1)
name1 data1
0 a 11.0
1 b 13.0
2 c 45.0
3 d NaN
df2 = pd.DataFrame([['a',['b','c','d']],['b',['a','c']]],columns=['name2','data2'])
d = df1.set_index('name1')['data1'].to_dict()
df2['data2'] = [sum(filter(lambda v: v==v, (d.get(y, 0) for y in x))) for x in df2['data2']]
print (df2)
name2 data2
0 a 58.0
1 b 56.0
答案 1 :(得分:2)
也可以
d = dict(df1.values)
df2['s'] = df2.data2.transform(lambda v: pd.Series(v).map(d)).sum(1)
0 146.0
1 56.0
dtype: float6
或
df2.data2.transform(lambda l: sum(d[i] for i in l))
0 146.0
1 56.0
dtype: float6
答案 2 :(得分:1)
您可以在pivot
上使用df1
将名称放入列,然后索引到df2
:
pivoted = df1.pivot(columns="name1").data1.sum()
df2.data2 = df2.data2.apply(lambda x: pivoted[x].sum())
name2 data2
0 a 146.0
1 b 56.0
答案 3 :(得分:1)
您可以将result.id is "undefined"
与collections.defaultdict
一起使用:
dict.__getitem__
对于较大的数据帧,这将比生成器表达式更有效:
from collections import defaultdict
d = defaultdict(int, df1.set_index('name1')['data1'].to_dict())
df2['sum'] = [sum(map(d.__getitem__, x)) for x in df2['data2']]
print(df2)
name2 data2 sum
0 a [b, c, d] 146
1 b [a, c, e] 56