我有两个pandas数据框,如下所示:
import pandas as pd
df_one = pd.DataFrame( {
'A': [1,1,2,3,4,4,4],
'B1': [0.5,0.0,0.2,0.1,0.3,0.2,0.1],
'B2': [0.2,0.3,0.1,0.5,0.3,0.1,0.2],
'B3': [0.1,0.2,0.0,0.9,0.0,0.3,0.5]} );
df_two = pd.DataFrame( {
'A': [1,2,3,4],
'C1': [1.0,9.0,2.1,9.0],
'C2': [2.0,3.0,0.7,1.1],
'C3': [5.0,4.0,2.3,3.4]} );
df_one
A B1 B2 B3
0 1 0.5 0.2 0.1
1 1 0.0 0.3 0.2
2 2 0.2 0.1 0.0
3 3 0.1 0.5 0.9
4 4 0.3 0.3 0.0
5 4 0.2 0.1 0.3
6 4 0.1 0.2 0.5
df_two
A C1 C2 C3
0 1 1.0 2.0 5.0
1 2 9.0 3.0 4.0
2 3 2.1 0.7 2.3
3 4 9.0 1.1 3.4
我想要做的是计算是一个标量积,我将第一个数据帧的行乘以第二个数据帧的行,即\sum_i B_i * C_i
,但是这样的方式是仅当A
列的值在两个帧中匹配时,第一个数据帧中的行才会乘以第二个数据帧中的行。我知道如何循环和使用if,但我想以更高效的类似numpy或pandas的方式做到这一点。任何帮助非常感谢:)
答案 0 :(得分:1)
不确定是否需要A列的唯一值(如果这样做,请在下面的结果中使用groupby)
pd.merge(df_one, df_two, on='A')
A B1 B2 B3 C1 C2 C3
0 1 0.5 0.2 0.1 1.0 2.0 5.0
1 1 0.0 0.3 0.2 1.0 2.0 5.0
2 2 0.2 0.1 0.0 9.0 3.0 4.0
3 3 0.1 0.5 0.9 2.1 0.7 2.3
4 4 0.3 0.3 0.0 9.0 1.1 3.4
5 4 0.2 0.1 0.3 9.0 1.1 3.4
6 4 0.1 0.2 0.5 9.0 1.1 3.4
pd.merge(df_one, df_two, on='A').apply(lambda s: sum([s['B%d'%i] * s['C%d'%i] for i in range(1, 4)]) , axis=1)
0 1.40
1 1.60
2 2.10
3 2.63
4 3.03
5 2.93
6 2.82
答案 1 :(得分:1)
另一种方法类似于此:
import pandas as pd
df_one = pd.DataFrame( {
'A': [1,1,2,3,4,4,4],
'B1': [0.5,0.0,0.2,0.1,0.3,0.2,0.1],
'B2': [0.2,0.3,0.1,0.5,0.3,0.1,0.2],
'B3': [0.1,0.2,0.0,0.9,0.0,0.3,0.5]} );
df_two = pd.DataFrame( {
'A': [1,2,3,4],
'C1': [1.0,9.0,2.1,9.0],
'C2': [2.0,3.0,0.7,1.1],
'C3': [5.0,4.0,2.3,3.4]} );
lookup = df_two.groupby(df_two.A)
def multiply_rows(row):
other = lookup.get_group(row['A'])
# We want every column after "A"
x = row.values[1:]
# In this case, other is a 2D array with one row, similar to "row" above...
y = other.values[0, 1:]
return x.dot(y)
# The "axis=1" makes each row to be passed in, rather than each column
result = df_one.apply(multiply_rows, axis=1)
print result
这导致:
0 1.40
1 1.60
2 2.10
3 2.63
4 3.03
5 2.93
6 2.82
答案 2 :(得分:0)
我会将行压缩在一起并使用过滤器或仅包含A列匹配的行的理解。
像
这样的东西[scalar_product(a,b) for a,b in zip (frame1, frame2) if a[0]==b[0]]
假设您愿意为scalar_product填写适当的材料
(如果我在这里做了一个思考,请道歉 - 这段代码仅供参考,尚未经过测试!)