我正在尝试在数据集上运行PCA,但是遇到涉及NaN的问题。我尝试删除多列并更改数据框的数据类型,但这些都不起作用。
我的一段代码:
from sklearn.preprocessing import StandardScaler
features = ['caloroies','protein','fat','sodium','fiber','carbo','sugars','potass','vitamins','shelf','weight','cups']
x = df.loc[:, features].values
y = df.loc[:,['rating_bucketed']].values
x = StandardScaler().fit_transform(x)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(Data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
我从中收到的错误如下:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
如果我检查x变量,则会收到以下消息:
print(x)
[[ nan 4.00e+00 1.00e+00 1.30e+02 1.00e+01 5.00e+00 6.00e+00 2.80e+02
2.50e+01 3.00e+00 1.00e+00 3.30e-01]
[ nan 3.00e+00 5.00e+00 1.50e+01 2.00e+00 8.00e+00 8.00e+00 1.35e+02
0.00e+00 3.00e+00 1.00e+00 1.00e+00]
[ nan 4.00e+00 1.00e+00 2.60e+02 9.00e+00 7.00e+00 5.00e+00 3.20e+02
2.50e+01 3.00e+00 1.00e+00 3.30e-01]
[ nan 4.00e+00 0.00e+00 1.40e+02 1.40e+01 8.00e+00 0.00e+00 3.30e+02
2.50e+01 3.00e+00 1.00e+00 5.00e-01]
就这样,您可以对我的起始数据集有所了解:
Python 3.7.1 脾气暴躁的1.15.4 熊猫0.23.4 Sklearn 0.20.1
有人能指出我要去哪里的正确方向吗?