Question

我正在尝试在数据集上运行PCA，但是遇到涉及NaN的问题。我尝试删除多列并更改数据框的数据类型，但这些都不起作用。

我的一段代码：

from sklearn.preprocessing import StandardScaler
features = ['caloroies','protein','fat','sodium','fiber','carbo','sugars','potass','vitamins','shelf','weight','cups']

x = df.loc[:, features].values
y = df.loc[:,['rating_bucketed']].values

x = StandardScaler().fit_transform(x)

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(Data = principalComponents
                       , columns = ['principal component 1', 'principal component 2'])

我从中收到的错误如下：

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

如果我检查x变量，则会收到以下消息：

print(x)

    [[     nan 4.00e+00 1.00e+00 1.30e+02 1.00e+01 5.00e+00 6.00e+00 2.80e+02
  2.50e+01 3.00e+00 1.00e+00 3.30e-01]
 [     nan 3.00e+00 5.00e+00 1.50e+01 2.00e+00 8.00e+00 8.00e+00 1.35e+02
  0.00e+00 3.00e+00 1.00e+00 1.00e+00]
 [     nan 4.00e+00 1.00e+00 2.60e+02 9.00e+00 7.00e+00 5.00e+00 3.20e+02
  2.50e+01 3.00e+00 1.00e+00 3.30e-01]
 [     nan 4.00e+00 0.00e+00 1.40e+02 1.40e+01 8.00e+00 0.00e+00 3.30e+02
  2.50e+01 3.00e+00 1.00e+00 5.00e-01]

就这样，您可以对我的起始数据集有所了解：

enter image description here

Python 3.7.1 脾气暴躁的1.15.4 熊猫0.23.4 Sklearn 0.20.1

有人能指出我要去哪里的正确方向吗？

我的输入包含NaN，无穷大或对于dtype而言太大的值-

0 个答案: