上限范围的术语“NA”未包含在模型中

时间:2017-05-02 10:30:43

标签: r

我正在处理数据集,并希望使用一些变量进行逐步逻辑回归,并且这样做我在R中使用add1()函数。可以从此处的链接下载数据集的示例:https://drive.google.com/file/d/0B0N-Nc7kEi4bVjhDd1FDaEE5cEE/view?usp=sharing

因此我使用以下方法进行逻辑回归:

train <- read.csv('training.csv')
glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + annual_inc + avg_cur_bal + bc_open_to_buy + delinq_2yrs + dti + inq_last_6mths + installment + int_rate + mo_sin_old_il_acct + mo_sin_old_rev_tl_op + mo_sin_rcnt_rev_tl_op + mo_sin_rcnt_tl + mort_acc + mths_since_last_delinq + mths_since_recent_bc + mths_since_recent_inq + num_accts_ever_120_pd + num_actv_bc_tl + num_actv_rev_tl + num_bc_tl + num_il_tl + num_op_rev_tl + num_tl_op_past_12m + pct_tl_nvr_dlq + percent_bc_gt_75 + pub_rec_bankruptcies + revol_bal + revol_util + term + total_acc + total_bc_limit + total_il_high_credit_limit + fico_mean + addr_state + emp_length + verification_status + Count_NA + Info_missing + Engineer + Teacher + Doctor + Professor + Manager + Director + Analyst + senior + lead + consultant + home_ownership_own + home_ownership_rent + purpose_debt_consolidation + purpose_medical + purpose_credit_card + purpose_other,
                    data = train, 
                    family = binomial(link = 'logit'))

并使用add1()函数进行前向选择。

add1(glm.model_step_1, scope = train)

此代码不起作用。我收到以下错误:

Error in factor.scope(attr(terms1, "factors"), list(add = attr(terms2, : upper scope has term ‘NA’ not included in model

有谁知道如何解决这个错误?

之前在datascience.stackexchange(https://datascience.stackexchange.com/questions/11604/checking-regression-coefficients-stability)上提出的一个问题提到了对NA的检查。数据集中没有任何NA,可以通过运行sapply(train, function(x) sum(is.na(x))

来确认

1 个答案:

答案 0 :(得分:1)

@Jash Sash的train数据集有一些异常值,其中强制read.csv读取一些数值变量作为具有多个类别的因子。
无论如何,我在这里考虑一个只有很少变量的模型,以显示如何避免上面报告的错误信息 请注意,scope参数必须是“ 公式 ,以便考虑添加或删除”;它不能像@Jash Sash的代码那样是data.frame。

train <- read.csv('training.csv')
numeric <- apply(train,2,is.factor)

glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy,
                    data = na.omit(train), 
                    family = binomial(link = 'logit'))

add1(glm.model_step_1, scope=~.+delinq_2yrs+inq_last_6mths+int_rate)

结果是:

Model:
loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy
               Df Deviance    AIC
<none>              1038.6 1046.6
delinq_2yrs     1   1037.9 1047.9
inq_last_6mths  1   1038.0 1048.0
int_rate        1   1038.0 1048.0