SKEARN Bestfeatures.fit (X,Y),是什么意思?我如何定义 X 和 Y?

时间:2021-08-01 15:47:15

标签: python-3.x scikit-learn

我的代码不起作用,我认为这是因为 X 和 Y 未定义。我从一本书中得到了代码,但它实际上并没有告诉我它们是如何定义的。

import pandas as pd
from matplotlib import pyplot
import seaborn as sns
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.datasets import load_digits
from pandas import read_csv
from pandas.plotting import scatter_matrix


filename = '/Users/rahulparmeshwar/Documents/Algo Bots/Data/Live Data/Tester.csv'
data = read_csv(filename)

correlation = data.corr()

bestfeatures = SelectKBest(k=5)
fit = bestfeatures.fit(X,Y)

dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
featurescores = pd.concat([dfcolumns,dfscores],axis=1)

pd.set_option('display.width',100)
data.head(1)
print(data)

scatter_matrix(data)
pyplot.show()

print(featurescores.nlargest('2,score'))

我已经检查了 SkLearn 的文档,但它不是很有帮助。任何帮助将不胜感激

1 个答案:

答案 0 :(得分:1)

Xy 应该是您从数据文件加载的功能集和目标变量。这是定义它们的一种典型方式:

data = read_csv(filename)
y = data['target variable name']
X = data.drop('target variable name', axis=1)