当在R中运行回归分析时(使用glm),由于缺失而删除了案例'的数据。有没有办法标记哪些案例已被删除?理想情况下,我希望将其从原始数据框中删除。
非常感谢
答案 0 :(得分:1)
如果没有可重复的示例,我无法提供适合您的问题的代码,但这是一个应该有效的通用方法。假设您的数据框名为df
,您的变量称为y,x1,x2等。并假设您需要模型中的y,x1,x3和x6。
# Make a vector of the variables that you want to include in your glm model
# (Be sure to include any weighting or subsetting variables as well, per Josh's comment)
glm.vars = c("y","x1","x3","x6")
# Create a new data frame that includes only those rows with no missing values
# for the variables that are in your model
df.glm = df[complete.cases(df[ , glm.vars]), ]
此外,如果您只想查看至少有一个缺失值的行,请执行以下操作(请注意添加!
(" not"运算符)):
df[!complete.cases(df[ , glm.vars]), ]
答案 1 :(得分:1)
glm()
返回的模型拟合对象记录了由于其不完整而排除的数据的行号。它们有点被埋没但你可以像这样检索它们:
## Example data.frame with some missing data
df <- mtcars[1:6, 1:5]
df[cbind(1:5,1:5)] <- NA
df
# mpg cyl disp hp drat
# Mazda RX4 NA 6 160 110 3.90
# Mazda RX4 Wag 21.0 NA 160 110 3.90
# Datsun 710 22.8 4 NA 93 3.85
# Hornet 4 Drive 21.4 6 258 NA 3.08
# Hornet Sportabout 18.7 8 360 175 NA
# Valiant 18.1 6 225 105 2.76
## Fit an example model, and learn which rows it excluded
f <- glm(mpg~drat,weight=disp, data=df)
as.numeric(na.action(f))
# [1] 1 3 5
或者,要获取行索引而不必使用模型,请使用与model.frame()
输出相同的策略:
as.numeric(na.action(model.frame(mpg~drat,weight=disp, data=df)))
# [1] 1 3 5