Question

我正在处理隔离森林进行APT分类，结果令人鼓舞。现在我正在尝试通过功能选择来提高性能，我考虑使用PCA方法。

df = pd.read_csv("features_labeled_PULITOIsolationFor.csv")
apt_list = list(set(df.apt))[1:]  ## Il primo è nan quindi lo scarto

for apt_name in apt_list:
   out_file = open("test.txt","a")
   out_file.write(apt_name + "\n" + "\n")
   out_file.close()
   df_apt17 = df[df["apt"] == apt_name]
   df_other = df[df["apt"] != apt_name]
   nf = 150
   pca = PCA(n_components=nf)
   df_apt17 = pca.fit_transform(df_apt17.drop("apt", 1))
   print(df_apt17.shape)
   kf = KFold(n_splits=10, random_state=1, shuffle=True)
   df_apt17 = df_apt17.reset_index()
   df_other = df_other.reset_index()

在此之后，我执行交叉验证，在Kfolder中划分相关的apt，并使用df_other数据帧测试与当前APT的某些元素合并的文件夹。

然而，尽管PCA似乎在功能减少之后起作用（在数据帧上由.shape看到），但它给出了reset_index（）函数的错误：

df_apt17 = df_apt17.reset_index（） AttributeError：'numpy.ndarray'对象没有属性'reset_index'

我该如何处理这个问题？

感谢所有人

Answer 1

通过阅读sklearn（http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html）中的文档，它说方法pca.fit_transform（）返回一个数组。

Arrays没有reset_index（）方法，只有pandas.DataFrame。

fit_transform（X，y =无）

使用X拟合模型并在X上应用降维。

参数：X：类似数组，形状（n_samples，n_features）

训练数据，其中n_samples是样本数和n_features   是功能的数量。

y ：忽略。

返回：X_new：类似数组，形状（n_samples，n_components）

如果需要使用reset_index（），则需要将其转换回pandas.DataFrame。

PCA用于功能选择

1 个答案: