hr Time0 Time1 Day Time2 cluster1 cluster3
0 20 11/4/2017 20:39 Night 3 0 Entertainment 2
1 1 21/03/2017 01:33:48 Night 3 0 Work 1
2 22 16/03/2017 22:26:15 Night 5 0 Work 1
3 2 2/4/2017 2:03 Night 1 0 Work 1
4 2 2/4/2017 2:03 Night 1 0 Work 1
5 2 2/4/2017 2:03 Night 1 0 Work 1
6 19 8/4/2017 19:02 Night 7 0 Entertainment 2
7 11 17/03/2017 11:17:19 Day 6 1 Entertainment 2
8 22 16/03/2017 22:28:58 Night 5 0 Work 1
9 2 2/4/2017 2:03 Night 1 0 Work 1
10 2 2/4/2017 2:03 Night 1 0 Work 1
11 2 2/4/2017 2:03 Night 1 0 Work 1
12 2 2/4/2017 2:03 Night 1 0 Work 1
13 2 2/4/2017 2:03 Night 1 0 Work 1
14 0 5/4/2017 0:46 Night 4 0 Entertainment 2
15 0 5/4/2017 0:46 Night 4 0 Entertainment 2
16 20 11/4/2017 20:37 Night 3 0 Entertainment 2
根据我的数据集,我已经执行了逻辑回归并想要预测与hr相关的集群 - 但是这段代码总是预测一个集群#1的符号。
这是我的代码:
import csv
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
noti=pd.read_csv('C:\\path\\to\\Final.csv', index_col=0)
Time=[]
Group=[]
NewTime=[]
NewGroup=[]
with open('C:\\path\\to\\Final.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
Time.append(row[1])
Group.append(row[8])
for i in Time[1:]:
NewTime.append(i)
for i in Group[1:]:
NewGroup.append(i)
X=pd.DataFrame(NewTime)
X.columns = ['Time']
y=pd.DataFrame(NewGroup)
y.columns=['Group']
print(X)
print(y)
from sklearn.linear_model import LogisticRegression
# Create logistic regression object
model = LogisticRegression()
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)
#Equation coefficient and Intercept
print('Coefficient: \n', model.coef_)
print('Intercept: \n', model.intercept_)
#predicted= model.predict(X)
#print(predicted)
res = model.predict(X)
print(pd.DataFrame(res))