根据多个条件填充列的元素

时间:2016-02-08 15:56:16

标签: r

我有一个数据框,我想删除包含异常值的任何一周。如果我可以将整周指示为异常值,我将很高兴,因为我了解如何从那里开始子集。我无法提出合适的解决方案。我一直认为我需要循环遍历几周的子集以达到预期目标,或者创建一个单独的函数来处理个别异常值周并使用sapply。我还没有使这两种解决方案都可行。

date <- seq(as.Date("2015-01-01"), length=365, by="1 day")
dow <- as.factor(weekdays(as.Date(date))
df <- data.frame(cbind(date, dow))
df$date <- as.Date(df$date,format="%m/%d/%Y",origin="01/01/1970")
df$dow <- as.factor(weekdays(as.Date(df$date)))
set.seed(1115)
df$var1 <- rnorm(365, 1912, 40795)
stdev <- sd(df$var1, na.rm=TRUE)
avg <- mean(df$var1, na.rm=TRUE)
df$LB <- avg-(2.75*stdev)
df$UB <- avg+(2.75*stdev)
df$outlier <- ifelse(df$var1<df$LB | df$var1>df$UB, 1,0)
df$weeknum <- as.numeric(format(df$date, "%U"))
head(df, 17)

> head(df, 17)
         date       dow       var1        LB       UB outlier weeknum
1  2015-01-01  Thursday  -7828.412 -114675.6 120479.8       0       0
2  2015-01-02    Friday  25674.456 -114675.6 120479.8       0       0
3  2015-01-03  Saturday -33588.871 -114675.6 120479.8       0       0
4  2015-01-04    Sunday -54418.175 -114675.6 120479.8       0       1
5  2015-01-05    Monday -10002.002 -114675.6 120479.8       0       1
6  2015-01-06   Tuesday  34050.390 -114675.6 120479.8       0       1
7  2015-01-07 Wednesday -37584.648 -114675.6 120479.8       0       1
8  2015-01-08  Thursday  84048.878 -114675.6 120479.8       0       1
9  2015-01-09    Friday -24801.346 -114675.6 120479.8       0       1
10 2015-01-10  Saturday  33974.637 -114675.6 120479.8       0       1
11 2015-01-11    Sunday  77432.088 -114675.6 120479.8       0       2
12 2015-01-12    Monday 128196.236 -114675.6 120479.8       1       2
13 2015-01-13   Tuesday   9740.418 -114675.6 120479.8       0       2
14 2015-01-14 Wednesday  26539.887 -114675.6 120479.8       0       2
15 2015-01-15  Thursday  12172.834 -114675.6 120479.8       0       2
16 2015-01-16    Friday   1032.544 -114675.6 120479.8       0       2
17 2015-01-17  Saturday  76870.095 -114675.6 120479.8       0       2

在上面的例子中,所需的输出是1,每行中与outnum = 2对应的异常值列。

2 个答案:

答案 0 :(得分:0)

你说&#34;所需的输出是1,每行的异常值列对应于weeknum = 2。&#34;那么你真的需要一个异常列吗?您似乎可以根据weeknum列的值简单地对data.frame进行子集化,如下所示:

df <- df[!(df$weeknum==2),]

答案 1 :(得分:0)

答案涉及测试两个向量。一旦我意识到这一点,我就能够优化我的搜索并找到合适的答案here

正确识别每个元素所需的代码是:

out.df <- df[which(df$outlier==1),]#Create a subset of only outlier rows
df$outlier <- ifelse(df$weeknum %in% out.df$weeknum, 1, 0)#Compare the new data frame
#weeknum against the old with the %in% operator, if they are equal leave 1, else 0.

这给出了结果:

> head(df, 17)
         date       dow       var1        LB       UB outlier weeknum
1  2015-01-01  Thursday  -7828.412 -114675.6 120479.8       0       0
2  2015-01-02    Friday  25674.456 -114675.6 120479.8       0       0
3  2015-01-03  Saturday -33588.871 -114675.6 120479.8       0       0
4  2015-01-04    Sunday -54418.175 -114675.6 120479.8       0       1
5  2015-01-05    Monday -10002.002 -114675.6 120479.8       0       1
6  2015-01-06   Tuesday  34050.390 -114675.6 120479.8       0       1
7  2015-01-07 Wednesday -37584.648 -114675.6 120479.8       0       1
8  2015-01-08  Thursday  84048.878 -114675.6 120479.8       0       1
9  2015-01-09    Friday -24801.346 -114675.6 120479.8       0       1
10 2015-01-10  Saturday  33974.637 -114675.6 120479.8       0       1
11 2015-01-11    Sunday  77432.088 -114675.6 120479.8       1       2
12 2015-01-12    Monday 128196.236 -114675.6 120479.8       1       2
13 2015-01-13   Tuesday   9740.418 -114675.6 120479.8       1       2
14 2015-01-14 Wednesday  26539.887 -114675.6 120479.8       1       2
15 2015-01-15  Thursday  12172.834 -114675.6 120479.8       1       2
16 2015-01-16    Friday   1032.544 -114675.6 120479.8       1       2
17 2015-01-17  Saturday  76870.095 -114675.6 120479.8       1       2

这是令人满意的。