预测模型:Logistic回归模型,流行的分类模型

时间:2020-10-17 00:41:45

标签: python machine-learning scikit-learn

我正在根据获胜方做出预测。我选择的列是候选人集和候选人集的投票,如数据集中一样。我的代码如下:-

# Loading and cleaning dataset
df4 = pd.read_csv('Election-Results-2018 - Parlimen_Results_By_Candidate.csv')
df4['Votes for Candidate'] = df4['Votes for Candidate'].str.replace(',','').astype(float)
df4['Total Votes Cast'] = df4['Total Votes Cast'].str.replace(',','').astype(float)
df4['% of total Votes'] = df4['% of total Votes'].str.replace('%','').astype(float)

# Step 1 - import the model 
from sklearn.linear_model import LogisticRegression

# Step 2 - Define your training data
columns = ['Candidate Party', 'Votes for Candidate']

# Step 3 - create training dataset
X = df[columns]
y = df['New Results']*

运行这些代码后,我收到如下错误:-

KeyError: "None of [Index(['Candidate Party', 'Votes for Candidate'], dtype='object')] are in the [columns]"

我是机器学习的初学者,希望能得到任何人的帮助和指导。 TQ

1 个答案:

答案 0 :(得分:0)

这是一个简单的错误,您使用了错误的名称df而不是df4,这应该有效:

df4 = pd.read_csv('Election-Results-2018 - Parlimen_Results_By_Candidate.csv')
df4['Votes for Candidate'] = df4['Votes for Candidate'].str.replace(',','').astype(float)
df4['Total Votes Cast'] = df4['Total Votes Cast'].str.replace(',','').astype(float)
df4['% of total Votes'] = df4['% of total Votes'].str.replace('%','').astype(float)

# Step 1 - import the model 
from sklearn.linear_model import LogisticRegression

# Step 2 - Define your training data
columns = ['Candidate Party', 'Votes for Candidate']

# Step 3 - create training dataset
X = df4[columns]
y = df4['New Results']