我有一个数据框如下:
vdate=c("12-04-2015","13-04-2015","14-04-2015","15-04-2015","12-05-2015","13-05-2015","14-05-2015"
,"15-05-2015","12-06-2015","13-06-2015","14-06-2015","15-06-2015")
month=c(4,4,4,4,5,5,5,5,6,6,6,6)
col1=c(12,12.4,14.3,3,5.3,1.8,7.6,4.5,7.6,10.7,12,15.7)
df=data.frame(vdate,month,col1)
下面是包含基于某些计算的值的列:
pvar=c(8.4,2.4,12,14.4,2.3,3.5,7.8,5,16,5.4,18,18.4)
现在我想替换pvar值,如果它的值小于该特定月份的平均值。
For example,
for month 4,
Average value of pvar is 9.3 ((8.4+2.4+12+14.4)/4).
然后替换pvar中的所有值,该值小于第4个月的平均值(8.4& 2.4)。 Pvar值为9.3,9.3,12,14.4
我需要为pvar中的所有值执行此操作。
答案 0 :(得分:3)
基础R解决方案是使用ave
。请注意,我们首先需要将日期列转换为实际日期以提取月份(strsplit
或正则表达式也可以这样做但我更喜欢将其设置为正确的日期),即
df$vdate <- as.POSIXct(df$vdate, format = '%d-%m-%Y')
with(df, ave(pvar, format(vdate, '%m'), FUN = function(i) replace(i, i < mean(i), mean(i))))
#[1] 9.30 9.30 12.00 14.40 4.65 4.65 7.80 5.00 16.00 14.45 18.00 18.40
根据您的编辑,我将使用dplyr来解决它,因为它可能更具可读性。实际上我提出了两种方法。
首先:创建一个额外的分组变量,它将把你需要的所有月份改为同一组中的值并从那里替换,即
library(dplyr)
cbind(df, pvar) %>%
group_by(grp = cumsum(!month %in% c(4, 5))+1, month) %>%
mutate(pvar = replace(pvar, pvar < mean(pvar), mean(pvar))) %>%
ungroup() %>%
select(-grp)
第二:过滤所需的月份,进行计算。然后过滤掉你不需要的月份,再次创建pvar
但不改变任何东西(绑定行所必需的)并绑定行,即
bind_rows(
cbind(df, pvar) %>%
filter(month %in% c(4, 5)) %>%
group_by(month) %>%
mutate(pvar = replace(pvar, pvar < mean(pvar), mean(pvar))),
cbind(df, pvar) %>%
filter(!month %in% c(4, 5))
)
以上两者都给出了
vdate month col1 pvar <fct> <dbl> <dbl> <dbl> 1 12-04-2015 4. 12.0 12.0 2 13-04-2015 4. 12.4 12.4 3 14-04-2015 4. 14.3 14.3 4 15-04-2015 4. 3.00 10.4 5 12-05-2015 5. 5.30 5.30 6 13-05-2015 5. 1.80 4.80 7 14-05-2015 5. 7.60 7.60 8 15-05-2015 5. 4.50 4.80 9 12-06-2015 6. 7.60 7.60 10 13-06-2015 6. 10.7 10.7 11 14-06-2015 6. 12.0 12.0 12 15-06-2015 6. 15.7 15.7
答案 1 :(得分:1)
基于dplyr
的解决方案可以是:
#Additional condition has been added to check if month != 6
cbind(df, pvar) %>%
group_by(month) %>%
mutate(pvar = ifelse(pvar < mean(pvar) & month != 6, mean(pvar), pvar)) %>%
as.data.frame()
# vdate month col1 pvar
# 1 12-04-2015 4 12.0 9.30
# 2 13-04-2015 4 12.4 9.30
# 3 14-04-2015 4 14.3 12.00
# 4 15-04-2015 4 3.0 14.40
# 5 12-05-2015 5 5.3 4.65
# 6 13-05-2015 5 1.8 4.65
# 7 14-05-2015 5 7.6 7.80
# 8 15-05-2015 5 4.5 5.00
# 9 12-06-2015 6 7.6 16.00
# 10 13-06-2015 6 10.7 5.40
# 11 14-06-2015 6 12.0 18.00
# 12 15-06-2015 6 15.7 18.40
数据强>
vdate=c("12-04-2015","13-04-2015","14-04-2015","15-04-2015","12-05-2015",
"13-05-2015","14-05-2015","15-05-2015","12-06-2015","13-06-2015",
"14-06-2015","15-06-2015")
month=c(4,4,4,4,5,5,5,5,6,6,6,6)
col1=c(12,12.4,14.3,3,5.3,1.8,7.6,4.5,7.6,10.7,12,15.7)
df=data.frame(vdate,month,col1)
pvar=c(8.4,2.4,12,14.4,2.3,3.5,7.8,5,16,5.4,18,18.4)