我有以下data.frame
:
time values outlier
20/01/2010 11 no
20/02/2010 12 no
20/03/2010 11 no
20/04/2010 12 no
20/05/2010 10 no
20/06/2010 20 yes
20/07/2010 11 no
20/02/2010 12 no
我想在此数据框上运行回归,其中values
作为我的自变量,time
作为因变量。但我想在outlier
列中排除“是”的所有行。
以下是我的尝试:
temp <- subset(df, outlier==yes)
fit <- lm(as.vector(temp$value) ~ as.vector(temp$time))
slope <- fit$coefficients[[2]]
intrcpt <- fit$coefficients[[1]]
temp$regression_points <- temp$value*fit$coefficients[[2]]+fit$coefficients[[1]]
现在我想使用获得的回归模型来预测temp
的原始值,并将结果放回到原始数据框中,如下所示:
time values outlier regression_points
20/01/2010 11 no 11
20/02/2010 12 no 11
20/03/2010 11 no 11
20/04/2010 12 no 11
20/05/2010 10 no 11
20/06/2010 20 yes
20/07/2010 11 no 11
20/02/2010 12 no 11
我该如何解决这个问题。
答案 0 :(得分:3)
请查看以下代码
# Create example data
set.seed(1)
df <- data.frame(time = as.Date(1:100), value = runif(100), outlier = sample(0:1, 100, TRUE))
# Fit model for non-outliers
fit <- lm(value ~ time, df[df$outlier == 0, ] )
# Estimate fitted values for those that are not-outliers
df$regression_points <- ifelse(df$outlier, NA, fitted(fit, df))
# time value outlier regression_points
# 1 1970-01-02 0.2655087 1 NA
# 2 1970-01-03 0.3721239 0 0.5866995
# 3 1970-01-04 0.5728534 0 0.5834598
答案 1 :(得分:3)
创建一个新数据框df2
,其中包含异常值NA&#39; d,然后将其与na.exclude
拟合:
df2 <- transform(df, values = ifelse(outlier == "no", values, NA))
fm <- lm(values ~ time, df2, na.action = na.exclude)
transform(df, fitted = fitted(fm))
,并提供:
time values outlier fitted
1 2010-01-20 11 no 11.64579
2 2010-02-20 12 no 11.49318
3 2010-03-20 11 no 11.35534
4 2010-04-20 12 no 11.20273
5 2010-05-20 10 no 11.05504
6 2010-06-20 20 yes NA
7 2010-07-20 11 no 10.75474
8 2010-02-20 12 no 11.49318
注意:以可重现的形式使用的输入是:
Lines <-
"time values outlier
20/01/2010 11 no
20/02/2010 12 no
20/03/2010 11 no
20/04/2010 12 no
20/05/2010 10 no
20/06/2010 20 yes
20/07/2010 11 no
20/02/2010 12 no"
df <- read.table(text = Lines, header = TRUE)
df$time <- as.Date(df$time, format = "%d/%m/%Y")
答案 2 :(得分:2)
fit <- lm(values ~ time, subset=outlier=="no", data=df)
df$regression_points <- NA
df$regression_points[df$outlier=="no"] <- fitted(fit)