df = {1,2,3
4,5,6
7,8,9,
10,11,12
}
weights={[1,3,3],[2,2,2],[3,1,1]}
我想将我的df与矩阵权重的每一行相乘(因此,我将拥有三个不同的df,一个用于权重向量,并通过保持最大的值行来组合每个df)。例如:
df0=df * weights[0]={1,6,9
4,15,18,
7,24,27
10,33,36
}
df1=df*wieghts[1]={2,4,6,
8,19,12,
14,16,18,
20,22,24
}
df2=df*wieghts[2]={3,2,3,
12,5,6,
21,8,9,
30,11,12
}
和
final_df_lines=max{df0,df1,df2}={1,6,9 - max line line from df0,
4,15,18, - max line from df0,
7,24,27 - max line from df0,
10,33,36 - max line from df0,
}
在此示例中,所有max均来自df0 ...,但它们可能来自三个df中的任何一个。最大行只是将同一行中的数字相加。
我需要将这些向量化(没有任何循环或如果...),我该怎么做?至少有可能吗?我真的很需要welp :(我已经在网上搜索了2天了...我在python中工作了太长时间了...
答案 0 :(得分:1)
编辑:由于问题已更新,我也必须进行更新:
您必须首先对齐矩阵才能在不使用任何循环的情况下进行按元素的矩阵运算:
import numpy as np
a = [
[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]
]
weights = [
[1,3,3],
[2,2,2],
[3,1,1]
]
w_s = np.array( (4 * [weights[0]], 4 * [weights[1]], 4 * [weights[2]]) )
a_s = np.array(3 * [a])
result_matrix1 = w_s * a_s[0]
result_matrix2 = w_s * a_s[1]
result_matrix3 = w_s * a_s[2]
print(result_matrix1)
print(result_matrix2)
print(result_matrix3)
输出:
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
解决方案是numpy
,但当然也可以用pandas
来解决。
答案 1 :(得分:1)
您可以尝试将concatenat
的所有权重列作为一个数据帧使用,suffix of column
重新设置每个权重,
然后乘以grouping with respect to the weight
乘以得到索引的最大总和
使用最大索引权重,您可以将数据框相乘
df2 = pd.concat([(df*i).add_suffix('__'+str(i)) for i in weights],axis=1).T
0 1 2 3
0__[1, 3, 3] 1 4 7 10
1__[1, 3, 3] 6 15 24 33
2__[1, 3, 3] 9 18 27 36
0__[2, 2, 2] 2 8 14 20
1__[2, 2, 2] 4 10 16 22
2__[2, 2, 2] 6 12 18 24
0__[3, 1, 1] 3 12 21 30
1__[3, 1, 1] 2 5 8 11
2__[3, 1, 1] 3 6 9 12
# by grouping with respect to the weight it multiplied, get max index
a = df2.groupby(df2.index.str.split('__').str[1]).apply(lambda x: x.sum()).idxmax()
# max weights with respect to summation of rows
df['idxmax'] = a.str.slice(1,-1).str.split(',').apply(lambda x: list(map(int,x)))
c [1, 3, 3]
d [1, 3, 3]
3 [1, 3, 3]
4 [1, 3, 3]
dtype: object
df.apply(lambda x: x.loc[df.columns.difference(['idxmax'])] * x['idxmax'],1)
0 1 2
0 1 6 9
1 4 15 18
2 7 24 27
3 10 33 36