在不失去价值重要性的情况下进行扩展Python Sklearn

时间:2019-02-07 09:36:49

标签: python python-3.x pandas scikit-learn

csv的示例是:

0.03528821669081923,0.4209514856338501
0.4755249949860231,0.4248427748680115
0.09710556840327728,0.4209169149398804
0.07149631133318766,0.4201127290725708
-0.2400341908399068,0.417565792798996
-0.17768551828033466,0.4184338748455048
-0.30025757809215714,0.416279673576355
-0.09094791496191304,0.41964152455329895
0.07154744586719554,0.4196658134460449
0.2381333126503035,0.42377570271492
0.2593105332145284,0.4222800433635712
-0.6691065606953182,0.4089060425758362
-0.6456401882265393,0.4092327654361725
-0.2320063391631248,0.4154394268989563
0.03676064944004283,0.4164957106113434
-0.049027521405378964,0.4175394177436829
-0.5611679536206179,0.4090659916400909
-1.151078217514793,0.3977192640304565
-1.1251183926252533,0.3976330757141113
-1.3598634565590335,0.3943647146224976
-1.452113101667516,0.3926326930522919
-1.724856436518542,0.3888352811336517
-1.3449567318568625,0.3950198888778687
-0.9327234868901516,0.39986416697502136
-0.8698905846258818,0.40163424611091614
-1.0829297248122909,0.4009062349796295
-0.7123502605778409,0.406065821647644
-0.7078240398708294,0.4043383300304413
-1.0054995188827682,0.4010890424251557
-0.40067943737923295,0.41085284948349
-0.3684788480142471,0.4130916893482208
-0.31293912846313354,0.4178936183452606

我已将其加载到pandas中,并尝试使用sklearn.preprocessing.scale()对其进行缩放,但它仅对指定的列进行缩放。

df['col1'] = sklearn.preprocessing.scale(df['col1'].values)
df['col2'] = sklearn.preprocessing.scale(df['col2'].values)

我想相对于另一列按比例缩放,以便可以在同一图形上进行绘制。仅当这些值在相同范围内并且不在该范围内重视值时,这才有可能。
请建议我可以做什么。

1 个答案:

答案 0 :(得分:1)

您可以做的一件事是改为使用sklearn.preprocessing.StandardScaler(可以使用数组进行拟合),然后使用计算的meanstd转换其他数组。因此,您可以执行以下操作:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

重塑数据框列中的numpy数组:

col1 = x['col1'].values.reshape(-1,1)
col2 = x['col2'].values.reshape(-1,1)

使用col1来适合实例化的对象:

fitted = scaler.fit(col1)

使用mean中的stdcol1标准化所有功能:

col1 = fitted.transform(col1)
col2 = fitted.transform(col2)