我想在参数和结果之间找到相关性。 我大约有60个参数,我想选择最相关的参数(例如5-10个参数)。 这是我的代码:
from sklearn.linear_model import LogisticRegression
from numpy import array
PocetTest = 220
PocetTrain = 1301
col = []
for i in range(5, 34):
col.append(i)
X = []
Y = []
Xnew = []
Yreal = []
DataTrain = []
DataTest = []
f = open("TrainDataTransformed.txt", "r")
lines = f.readlines()
for i in range(1, PocetTrain):
G = []
for d in lines[i].split("\t"):
try:
G.append(float(d))
except:
G.append(d)
DataTrain.append(G)
f = open("TestDataTransformed.txt", "r")
lines = f.readlines()
for i in range(1, PocetTest):
G = []
for d in lines[i].split("\t"):
try:
G.append(float(d))
except:
G.append(d)
DataTest.append(G)
DataTrain = array(DataTrain)
DataTest = array(DataTest)
X = DataTrain[:, col]
#print(X[0])
for d in DataTest[:, col]:
Xnew.append(list(map(float, d)))
for d in DataTest:
if d[3] > d[4]:
Yreal.append(2)
if d[3] == d[4]:
Yreal.append(1)
if d[3] < d[4]:
Yreal.append(0)
Yreal = array(Yreal)
for d in DataTrain:
if d[3] > d[4]:
Y.append(2)
if d[3] == d[4]:
Y.append(1)
if d[3] < d[4]:
Y.append(0)
Y = array(Y)
# fit final model
model = LogisticRegression()
model.fit(X, Y)
# new instances where we do not know the answer
# make a prediction
ynew = model.predict_proba(Xnew)
来自X的哪些参数与Y的相关性最大(结果)?有什么建议我应该使用哪个功能?我使用scikit-learn