我有以下两个(简化的)数据框:
df1=
origin destination val1 val2
0 1 A 0.8 0.9
1 1 B 0.3 0.5
2 1 c 0.4 0.2
3 2 A 0.4 0.7
4 2 B 0.2 0.1
5 2 c 0.5 0.1
df2=
org price
0 1 50
1 2 45
我需要做的是从df2的每个来源中选择价格,将其乘以df1中val1 + val2的总和,然后将其写入csv文件。
A的计算如下:
A =>(0.8 + 0.9)* 50 +(0.4+ 0.7)* 45 = 134.5
此处,值0.8、0.9、0.4和0.7来自df1,它们对应于A的val1和val2 其中,值50和45来自分别对应于原点1和原点2的df2。 对于B,计算将为
B =>(0.3 + 0.5)* 50 +(0.2 + 0.1)* 45 = 53.5
对于C,计算公式为:
C =>(0.4 + 0.2)* 50 +(0.5 + 0.1)* 45 = 57
最终的CSV文件应如下所示:
A,134.5
B,53.5
C,57 我为此编写了以下python代码:
# first convert the second table into a python dictionary so that I can refer price value at each origin
df2_dictionary = {}
for ind in df2.index:
df2_dictionary[df2['org'][ind]] = float(df2['price'][ind])
# now go through df1, add up val1 and val2 and add the result to the result dictionary.
result = {}
for ind in df1.index:
origin = df1['origin'][ind]
price = df2_dictionary[origin] # figure out the price from the dictionary.
r = (df1['val1'][ind] + df1['val2'][ind])*price # this is the needed calculation
destination = df1['destination'][ind] # store the result in destination
if(destination in result.keys()):
result[destination] = result[destination]+r
else:
result[destination] = r
f = open("result.csv", "w")
for key in result:
f.write(key+","+str(result[key])+"\n")
f.close()
这是很多工作,并且不使用pandas内置函数。我该如何简化呢?我并不担心效率。
答案 0 :(得分:1)
可以先使用map
然后使用groupby
解决您的问题:
df1['total'] = (df1[['val1','val2']].sum(1)
.mul(df1['origin']
.map(df2.set_index('org').price)
)
)
summary = df1.groupby('destination')['total'].sum()
# save to csv
summary.to_csv('/path/to/file.csv')
输出(summary
):
destination
A 134.5
B 53.5
c 57.0
Name: total, dtype: float64