给定一个DataFrame,我尝试从几个尝试(表示为列)中选择最适合target
列的那个。
import pandas as pd # tried with pandas 0.22 and pandas 0.20
data = {0.1: {10000.0: 1.1417023723316702,
20000.0: 1.675669860738065,
30000.0: 2.1391047345794565,
40000.0: 2.588884897140648},
0.3: {10000.0: 3.4251071169950102,
20000.0: 5.027009582214195,
30000.0: 6.4173142037383695,
40000.0: 7.766654691421943},
0.5: {10000.0: 5.708511861658351,
20000.0: 8.378349303690324,
30000.0: 10.695523672897282,
40000.0: 12.94442448570324},
0.7: {10000.0: 7.99191660632169,
20000.0: 11.729689025166454,
30000.0: 14.973733142056194,
40000.0: 18.122194279984534},
0.9: {10000.0: 10.275321350985031,
20000.0: 15.081028746642584,
30000.0: 19.25194261121511,
40000.0: 23.29996407426583},
'target': {10000.0: 8.95547589186585,
20000.0: 12.664955463781974,
30000.0: 15.511339250669858,
40000.0: 17.9109517837317}}
values = pd.DataFrame(data)
values
Out[4]:
0.1 0.3 0.5 0.7 0.9 target
10000.0 1.141702 3.425107 5.708512 7.991917 10.275321 8.955476
20000.0 1.675670 5.027010 8.378349 11.729689 15.081029 12.664955
30000.0 2.139105 6.417314 10.695524 14.973733 19.251943 15.511339
40000.0 2.588885 7.766655 12.944424 18.122194 23.299964 17.910952
我的计划是使用Pandas DataFrame.corr()来快速提示。但是,我得到的结果不是完全使用,因为target
与try的所有值都等于1。
这种方法有什么问题:
values.corr()
Out[5]:
0.1 0.3 0.5 0.7 0.9 target
0.1 1.000000 1.000000 1.000000 1.000000 1.000000 0.998252
0.3 1.000000 1.000000 1.000000 1.000000 1.000000 0.998252
0.5 1.000000 1.000000 1.000000 1.000000 1.000000 0.998252
0.7 1.000000 1.000000 1.000000 1.000000 1.000000 0.998252
0.9 1.000000 1.000000 1.000000 1.000000 1.000000 0.998252
target 0.998252 0.998252 0.998252 0.998252 0.998252 1.000000