查找参数与结果之间的相关性

时间:2019-05-11 22:16:11

标签: python machine-learning scikit-learn correlation

我想在参数和结果之间找到相关性。 我大约有60个参数,我想选择最相关的参数(例如5-10个参数)。 这是我的代码:

from sklearn.linear_model import LogisticRegression
from numpy import array

PocetTest = 220
PocetTrain = 1301
col = []
for i in range(5, 34):
    col.append(i)

X = []
Y = []
Xnew = []
Yreal = []
DataTrain = []
DataTest = []

f = open("TrainDataTransformed.txt", "r") 
lines = f.readlines()
for i in range(1, PocetTrain):
    G = []
    for d in lines[i].split("\t"):
        try:
            G.append(float(d))
        except:
            G.append(d)
    DataTrain.append(G)

f = open("TestDataTransformed.txt", "r") 
lines = f.readlines()
for i in range(1, PocetTest):
    G = []
    for d in lines[i].split("\t"):
        try:
            G.append(float(d))
        except:
            G.append(d)
    DataTest.append(G)

DataTrain = array(DataTrain)
DataTest = array(DataTest)

X = DataTrain[:, col]

#print(X[0])

for d in DataTest[:, col]:
    Xnew.append(list(map(float, d)))

for d in DataTest:
    if d[3] > d[4]:
        Yreal.append(2)
    if d[3] == d[4]:
        Yreal.append(1)
    if d[3] < d[4]:
        Yreal.append(0)

Yreal = array(Yreal)

for d in DataTrain:
    if d[3] > d[4]:
        Y.append(2)
    if d[3] == d[4]:
        Y.append(1)
    if d[3] < d[4]:
        Y.append(0)

Y = array(Y)


# fit final model
model = LogisticRegression()
model.fit(X, Y)

# new instances where we do not know the answer

# make a prediction
ynew = model.predict_proba(Xnew)

来自X的哪些参数与Y的相关性最大(结果)?有什么建议我应该使用哪个功能?我使用scikit-learn

0 个答案:

没有答案