我正在尝试运行逻辑回归,根据年龄、数量和每周小时数预测收入。收入列由 <=50K 或 >50 组成。我尝试使用 Pandas.map() 函数用下面的数字替换分类数据并收到错误:
'DataFrame' object has no attribute 'map'
。然后我尝试添加 rdd 函数(如下所示)但得到错误:
'DataFrame' object has no attribute 'rdd'
import pandas as pd
import statsmodels.api as sm
adult_train = pd.read_csv("C:/.../adult_training.csv")
adult_test = pd.read_csv("C:/.../adult_test.csv")
# Separate data into predictor variables, X, and target variables, y:
X = pd.DataFrame(adult_train[['age', 'hours-per-week', 'num']])
X = sm.add_constant(X)
y = pd.DataFrame(adult_train[['income']]).rdd.map({'<=50K': 0, '>50K': 1}).astype(int)
logreg01 = sm.Logit(y, X).fit()
如果您能帮我运行最后一行代码,我将不胜感激。