我想使用多个数字特征进行特征工程,其思想是跨数据框进行成对乘法,首选答案是机器学习库中可用的东西,例如TensorFlow,Keras,{{ 3}},TPOT等(我不知道此过程的科学名称),但是在没有库的情况下可以做到这一点。
这是我的简化数据集
No feature_1 feature_2 feature_3
1 10 20 30
2 20 30 40
这就是我需要的
No feature_1 feature_2 feature_3 feature_1xfeature2 feature_1xfeature_2 feature_2xfeature_3
1 10 20 30 200 300 600
2 20 30 40 600 800 1200
我做了
df['feature_1xfeature2'] = df['feature_1'] * df['feature_2']
df['feature_1xfeature3'] = df['feature_1'] * df['feature_3']
df['feature_2xfeature3'] = df['feature_2'] * df['feature_3']
这很容易因大量功能而出错。如何自动执行此操作?
答案 0 :(得分:1)
您可以使用itertools
来获取所有列的乘积:
import itertools
for col_a, col_b in itertools.product(df.columns, 2):
df[col_a + 'x' + col_b] = df[col_a] * df[col_b]
从df.columns中取出2个项目时,itertools.product(df.columns, 2)
会产生列的所有组合。
更详细地研究您的问题,我认为最好使用itertools.combinations
。这不会产生所有可能的产品,但会产生所有可能的组合。
例如,假设列“ A”,“ B”,“ C”
itertools.product
产生('A','A'),('A','B'),('A','C'),('B','A'),( <'B','B'),('B','C'),('C','A'),('C','B'),('C','C')。 / p>
itertools.combinations
产生('A','B'),('A','C'),('B','C')
因此,这样做会更好:
import itertools
for col_a, col_b in itertools.combinations(df.columns, 2):
df[col_a + 'x' + col_b] = df[col_a] * df[col_b]
答案 1 :(得分:1)
还有其他更专业的方法可以自动执行此操作。例如。 {198: -117.52079772949219, 383: -118.29053497314453, 887: -119.25838470458984, 1119: -119.66973876953125, 632: -119.74752807617188, 628: -119.87970733642578, 554: -119.88958740234375, 1081: -119.9058837890625, 843: -120.10496520996094, 317: -120.21776580810547, 2102: -120.23406982421875, 770: -120.31946563720703, 2293: -120.40717315673828, 1649: -120.44376373291016, 366: -120.47624969482422, 2080: -120.4794921875, 2735: -120.74302673339844, 3244: -120.89102935791016, 2893: -120.97686004638672, 314: -120.98660278320312, 5334: -121.00469970703125, 1318: -121.03706359863281, 679: -121.12769317626953, 1881: -121.14120483398438, 1629: -121.18737030029297, 50256: -121.19244384765625, 357: -121.22344207763672, 1550: -121.27531433105469, 775: -121.31112670898438, 7486: -121.3316421508789, 921: -121.37474060058594, 1114: -121.43411254882812, 2312: -121.43602752685547, 1675: -121.51364135742188, 4874: -121.5697021484375, 1867: -121.57322692871094, 1439: -121.60330963134766, 8989: -121.60348510742188, 1320: -121.604621
:
PolynomialFeatures