传递的值的形状为(1000,10),索引表示(1000,11)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.columns)
答案 0 :(得分:0)
错误
传递的值的形状为(1000,10),索引表示(1000,11)
发生在此行
df_feat = pd.DataFrame(scaled_features,columns=df.columns)
因为scaled_features
有10列,但是df.columns
的长度为11。
请注意,两次调用df.drop('TARGET CLASS',axis=1)
从TARGET CLASS
中删除df
列。您似乎要从新列列表中删除df
中的多余列。
可以通过保存对df.drop('TARGET CLASS',axis=1)
的引用来解决此问题
(我们称其为df_minus_target
),并将df_minus_target.columns
传递为新的列列表:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_minus_target = df.drop('TARGET CLASS',axis=1)
scaler.fit(df_minus_target)
scaled_features = scaler.transform(df_minus_target)
df_feat = pd.DataFrame(scaled_features,columns=df_minus_target.columns)
答案 1 :(得分:0)
提取列以创建df_feat
数据框(应为pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns)
时,您忘记了从df中删除最后一列,请参见下面的整个可复制示例:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)
scaler = StandardScaler()
scaler.fit(df.drop('TARGET CLASS',axis=1))
scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))
df_feat = pd.DataFrame(scaled_features,columns=df.drop('TARGET CLASS',axis=1).columns)
print(df_feat)
或者为了防止将来发生此类错误,请先将要处理的功能列提取到单独的数据框中:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Mock your dataset:
df = pd.DataFrame(np.random.rand(5, 10))
df = pd.concat([df, pd.Series([1, 1, 0, 0, 1], name='TARGET CLASS')], axis=1)
# Extract raw features columns first.
df_feat = df.drop('TARGET CLASS', axis=1)
# Do transformations.
scaler = StandardScaler()
scaler.fit(df_feat)
scaled_features = scaler.transform(df_feat)
df_feat_scaled = pd.DataFrame(scaled_features, columns=df_feat.columns)
print(df_feat_scaled)