我在r
中有以下数据框 name date month year hours
SSI 01-01-2016 01 2016 2000
SSI 02-01-2016 01 2016 1900
SSI 03-01-2016 01 2016 2038
SSI 04-01-2016 01 2016 2041
SSII 01-01-2016 01 2016 2000
SSII 02-01-2016 01 2016 2100
SSII 03-01-2016 01 2016 2105
SSII 04-01-2016 01 2016 2203
我想为每个名字lag of hours
月份和年份计算group by
。我可以使用以下代码来执行此操作
df1 <- df %>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
as.data.frame()
我想要的是running_hrs
大于24或小于0,我想用这个月的平均值来限制这些值。我正在做以下。
new_df <- df%>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0,mean(running_hrs),running_hrs)) %>%
as.data.frame()
name date month year hours running_hrs running_hrs_new
SSI 01-01-2016 01 2016 2000 NA
SSI 02-01-2016 01 2016 1900 -100 (3/4)
SSI 03-01-2016 01 2016 2038 138 (3/4)
SSI 04-01-2016 01 2016 2041 3 3
SSII 01-01-2016 01 2016 2000 NA
SSII 02-01-2016 01 2016 2100 100 (10/4)
SSII 03-01-2016 01 2016 2105 5 5
SSII 04-01-2016 01 2016 2110 5 5
值应替换为小于24且大于或等于零的运行小时数的平均值。我想我们可以使用条件均值
答案 0 :(得分:1)
希望这有帮助!
library(dplyr)
library(tidyr)
new_df <- df%>%
group_by(name,year,month) %>%
mutate(running_hrs = hours- lag(hours)) %>%
mutate(valid_running_hrs= ifelse(running_hrs < 24 & running_hrs > 0,running_hrs,0)) %>%
replace_na(list(valid_running_hrs=0)) %>%
group_by(name,year,month) %>%
mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0, mean(valid_running_hrs), running_hrs)) %>%
as.data.frame()