三个或更多熊猫数据帧列的所有可能组合的乘法运算

时间:2018-09-26 19:13:19

标签: python pandas

我有三个数据框:

DT            D1        D2        D3        D4   Unknown
Cstep                                                   
step 0  0.320039  0.048425  0.088292  0.085029  0.240678
step 1  0.226455  0.236200  0.206625  0.165754  0.163254
step 2  0.172478  0.199502  0.221124  0.266180  0.193045
step 3  0.164097  0.209790  0.218212  0.309156  0.180891
step 4  0.116930  0.306083  0.265747  0.173881  0.222132

RE            E1        E2        E3   Unknown
Cstep                                         
step 0  0.256725  0.086275  0.105281  0.244701
step 1  0.213159  0.215714  0.142406  0.162372
step 2  0.187955  0.222353  0.213388  0.192917
step 3  0.164817  0.252570  0.372562  0.188435
step 4  0.177344  0.223087  0.166364  0.211576

DS            S1        S2        S3        S4   Unknown
Cstep                                                   
step 0  0.210452  0.115157  0.019318  0.074852  0.261005
step 1  0.228217  0.188214  0.248233  0.118122  0.150845
step 2  0.189803  0.234401  0.303194  0.242742  0.185957
step 3  0.197577  0.219602  0.246099  0.233097  0.184726
step 4  0.173951  0.242626  0.183155  0.331187  0.217467

我希望代码执行三个或更多熊猫数据帧的所有可能组合的乘法运算。我想要一个通用代码来将任意数量的数据帧乘以列。

这些数据帧的所有可能组合为:

[('D1', 'E1', 'S1') ('D1', 'E1', 'S2') ('D1', 'E1', 'S3')
 ('D1', 'E1', 'S4') ('D1', 'E1', 'Unknown') ('D1', 'E2', 'S1')
 ('D1', 'E2', 'S2') ('D1', 'E2', 'S3') ('D1', 'E2', 'S4')
 ('D1', 'E2', 'Unknown') ('D1', 'E3', 'S1') ('D1', 'E3', 'S2')
 ('D1', 'E3', 'S3') ('D1', 'Unknown', 'S1') ('D1', 'Unknown', 'S2')
 ('D1', 'Unknown', 'S4') ('D1', 'Unknown', 'Unknown') ('D2', 'E1', 'S1')
 ('D2', 'E1', 'S2') ('D2', 'E1', 'S3') ('D2', 'E1', 'S4')
 ('D2', 'E1', 'Unknown') ('D2', 'E2', 'S1') ('D2', 'E2', 'S2')
 ('D2', 'E2', 'S3') ('D2', 'E2', 'S4') ('D2', 'E2', 'Unknown')
 ('D2', 'E3', 'S1') ('D2', 'Unknown', 'S1') ('D2', 'Unknown', 'S2')
 ('D2', 'Unknown', 'S3') ('D2', 'Unknown', 'Unknown') ('D3', 'E1', 'S1')
 ('D3', 'E1', 'S2') ('D3', 'E1', 'S3') ('D3', 'E1', 'S4')
 ('D3', 'E1', 'Unknown') ('D3', 'E2', 'S1') ('D3', 'E2', 'S2')
 ('D3', 'E2', 'S3') ('D3', 'E2', 'S4') ('D3', 'E2', 'Unknown')
 ('D3', 'E3', 'S1') ('D3', 'E3', 'S2') ('D3', 'E3', 'S3')
 ('D3', 'E3', 'Unknown') ('D3', 'Unknown', 'S1') ('D3', 'Unknown', 'S2')
 ('D3', 'Unknown', 'S3') ('D3', 'Unknown', 'S4')
 ('D3', 'Unknown', 'Unknown') ('D4', 'E1', 'S1') ('D4', 'E1', 'S2')
 ('D4', 'E1', 'S3') ('D4', 'E1', 'S4') ('D4', 'E1', 'Unknown')
 ('D4', 'E2', 'S1') ('D4', 'E2', 'S2') ('D4', 'E2', 'S3')
 ('D4', 'E2', 'S4') ('D4', 'E2', 'Unknown') ('D4', 'E3', 'S1')
 ('D4', 'E3', 'S2') ('D4', 'E3', 'S3') ('D4', 'E3', 'S4')
 ('D4', 'E3', 'Unknown') ('D4', 'Unknown', 'S1') ('D4', 'Unknown', 'S2')
 ('D4', 'Unknown', 'S3') ('D4', 'Unknown', 'S4')
 ('D4', 'Unknown', 'Unknown') ('Unknown', 'E1', 'S1')
 ('Unknown', 'E1', 'S2') ('Unknown', 'E1', 'S3') ('Unknown', 'E1', 'S4')
 ('Unknown', 'E1', 'Unknown') ('Unknown', 'E2', 'S1')
 ('Unknown', 'E2', 'S2') ('Unknown', 'E2', 'S3') ('Unknown', 'E2', 'S4')
 ('Unknown', 'E2', 'Unknown') ('Unknown', 'E3', 'S1')
 ('Unknown', 'E3', 'S2') ('Unknown', 'E3', 'S3') ('Unknown', 'E3', 'S4')
 ('Unknown', 'Unknown', 'S1') ('Unknown', 'Unknown', 'S2')
 ('Unknown', 'Unknown', 'S3') ('Unknown', 'Unknown', 'S4')
 ('Unknown', 'Unknown', 'Unknown')]

1 个答案:

答案 0 :(得分:1)

pd.concat

from functools import reduce
from operator import mul
from itertools import product

pd.concat({
    k: reduce(mul, (d[c] for d, c in zip([d1, d2, d3], k)))
    for k in product(d1, d2, d3)
}, axis=1)

              D1                                                                                              ...      Unknown                                                                                          
              E1                                                E2                                            ...           E3                                           Unknown                                        
              S1        S2        S3        S4   Unknown        S1        S2        S3        S4   Unknown    ...           S1        S2        S3        S4   Unknown        S1        S2        S3        S4   Unknown
step 0  0.017291  0.009462  0.001587  0.006150  0.021445  0.005811  0.003180  0.000533  0.002067  0.007207    ...     0.005333  0.002918  0.000489  0.001897  0.006614  0.012394  0.006782  0.001138  0.004408  0.015372
step 1  0.011016  0.009085  0.011982  0.005702  0.007281  0.011148  0.009194  0.012126  0.005770  0.007369    ...     0.005306  0.004376  0.005771  0.002746  0.003507  0.006050  0.004989  0.006580  0.003131  0.003999
step 2  0.006153  0.007599  0.009829  0.007869  0.006028  0.007279  0.008990  0.011628  0.009309  0.007132    ...     0.007819  0.009656  0.012490  0.009999  0.007660  0.007069  0.008729  0.011291  0.009040  0.006925
step 3  0.005344  0.005939  0.006656  0.006304  0.004996  0.008189  0.009102  0.010200  0.009661  0.007656    ...     0.013315  0.014800  0.016585  0.015709  0.012449  0.006735  0.007485  0.008389  0.007945  0.006297
step 4  0.003607  0.005031  0.003798  0.006868  0.004510  0.004538  0.006329  0.004778  0.008639  0.005673    ...     0.006428  0.008966  0.006768  0.012239  0.008036  0.008175  0.011403  0.008608  0.015565  0.010220

[5 rows x 100 columns]

数字广播和pandas.MultiIndex.from_product

a = v1[:, :, None, None] * v2[:, None, :, None] * v3[:, None, None, :]
a = a.reshape(len(a), -1)

cols = pd.MultiIndex.from_product([d1, d2, d3])

pd.DataFrame(a, d1.index, cols)