如何有效地跨数据帧配对乘法

时间:2018-08-28 09:39:47

标签: python pandas dataframe feature-extraction

我想使用多个数字特征进行特征工程,其思想是跨数据框进行成对乘法,首选答案是机器学习库中可用的东西,例如TensorFlowKeras,{{ 3}},TPOT等(我不知道此过程的科学名称),但是在没有库的情况下可以做到这一点。

这是我的简化数据集

No  feature_1  feature_2  feature_3
1          10         20         30
2          20         30         40 

这就是我需要的

No  feature_1  feature_2  feature_3  feature_1xfeature2  feature_1xfeature_2  feature_2xfeature_3
1          10         20         30                 200                  300                  600            
2          20         30         40                 600                  800                 1200

我做了

df['feature_1xfeature2'] =  df['feature_1'] * df['feature_2']
df['feature_1xfeature3'] =  df['feature_1'] * df['feature_3']
df['feature_2xfeature3'] =  df['feature_2'] * df['feature_3'] 

这很容易因大量功能而出错。如何自动执行此操作?

2 个答案:

答案 0 :(得分:1)

您可以使用itertools来获取所有列的乘积:

import itertools

for col_a, col_b in itertools.product(df.columns, 2):
    df[col_a + 'x' + col_b] = df[col_a] * df[col_b]

从df.columns中取出2个项目时,itertools.product(df.columns, 2)会产生列的所有组合。

编辑

更详细地研究您的问题,我认为最好使用itertools.combinations。这不会产生所有可能的产品,但会产生所有可能的组合。

例如,假设列“ A”,“ B”,“ C”

itertools.product产生('A','A'),('A','B'),('A','C'),('B','A'),( <'B','B'),('B','C'),('C','A'),('C','B'),('C','C')。 / p>

itertools.combinations产生('A','B'),('A','C'),('B','C')

因此,这样做会更好:

import itertools

for col_a, col_b in itertools.combinations(df.columns, 2):
    df[col_a + 'x' + col_b] = df[col_a] * df[col_b]

答案 1 :(得分:1)

还有其他更专业的方法可以自动执行此操作。例如。 {198: -117.52079772949219, 383: -118.29053497314453, 887: -119.25838470458984, 1119: -119.66973876953125, 632: -119.74752807617188, 628: -119.87970733642578, 554: -119.88958740234375, 1081: -119.9058837890625, 843: -120.10496520996094, 317: -120.21776580810547, 2102: -120.23406982421875, 770: -120.31946563720703, 2293: -120.40717315673828, 1649: -120.44376373291016, 366: -120.47624969482422, 2080: -120.4794921875, 2735: -120.74302673339844, 3244: -120.89102935791016, 2893: -120.97686004638672, 314: -120.98660278320312, 5334: -121.00469970703125, 1318: -121.03706359863281, 679: -121.12769317626953, 1881: -121.14120483398438, 1629: -121.18737030029297, 50256: -121.19244384765625, 357: -121.22344207763672, 1550: -121.27531433105469, 775: -121.31112670898438, 7486: -121.3316421508789, 921: -121.37474060058594, 1114: -121.43411254882812, 2312: -121.43602752685547, 1675: -121.51364135742188, 4874: -121.5697021484375, 1867: -121.57322692871094, 1439: -121.60330963134766, 8989: -121.60348510742188, 1320: -121.604621

PolynomialFeatures