我正在尝试将数据滞后到R中的前一天。但是,在数据集中,每天都有多个观测值。我该如何实现?
我已经研究过使用dplyr滞后变量并使用if语句来实现这一点,但是这将需要大约8个嵌套的if语句,以确保所有观察结果都滞后于前一天。
df <- df %>% dplyr::group_by(HomeTeam) %>%
dplyr::arrange(Date) %>%
dplyr::mutate(Score = ifelse(lag(Date) != Date, lag(Score),
ifelse(lag(Date, n = 2) != lag(Date),
lag(Score,n = 2), ifelse...)))
df <- data.frame(HomeTeam = c("Wolves", "Wolves", "Wolves"), Date = c("2019-08-20", "2019-08-20", "2019-08-19")
输入数据
HomeTeam Date Score
Wolves 2019-08-20 3
Wolves 2019-08-20 1
Wolves 2019-08-19 4
输出数据
HomeTeam Date Score
Wolves 2019-08-20 4
Wolves 2019-08-20 4
Wolves 2019-08-19 NA
解决方案
df <- data.frame(HomeTeam = c("Wolves", "Wolves",
"Wolves","Wolves","Wolves", "Man Utd", "Man Utd", "Man Utd"), Date =
c("2019-08-20", "2019-08-20", "2019-08-19", "2019-08-19", "2019-08-15",
"2019-06-01", "2019-06-01", "2019-04-01"), Score = c(3,1,2,2,4,5,6,7))
df %>% dplyr::mutate(Date = as.Date(Date)) %>%
dplyr::arrange(Date)%>%
dplyr::group_by(HomeTeam) %>%
dplyr::mutate(lagScore = lag(Score)) %>%
dplyr::arrange(Date) %>%
dplyr::group_by(Date,HomeTeam) %>%
dplyr::mutate(lagScore = lagScore[1]) %>%
dplyr::ungroup()
# A tibble: 8 x 4
# HomeTeam Date Score lagScore
# <fct> <date> <dbl> <dbl>
# Man Utd 2019-04-01 7 NA
# Man Utd 2019-06-01 5 7
# Man Utd 2019-06-01 6 7
# Wolves 2019-08-15 4 NA
# Wolves 2019-08-19 2 4
# Wolves 2019-08-19 2 4
# Wolves 2019-08-20 3 2
# Wolves 2019-08-20 1 2
答案 0 :(得分:2)
假设您落后于以前发生的日期(可能不是“昨天”),请尝试以下操作:
library(dplyr)
set.seed(2)
data.frame(
id = 1:10,
Date = Sys.Date() + sort(sample(c(0, 1, 3), size=10, replace=TRUE))
) %>%
mutate(lagDate = lag(Date)) %>%
group_by(Date) %>%
mutate(lagDate = lagDate[1]) %>%
ungroup()
# # A tibble: 10 x 3
# id Date lagDate
# <int> <date> <date>
# 1 1 2019-08-20 NA
# 2 2 2019-08-20 NA
# 3 3 2019-08-20 NA
# 4 4 2019-08-21 2019-08-20
# 5 5 2019-08-21 2019-08-20
# 6 6 2019-08-21 2019-08-20
# 7 7 2019-08-23 2019-08-21
# 8 8 2019-08-23 2019-08-21
# 9 9 2019-08-23 2019-08-21
# 10 10 2019-08-23 2019-08-21
多一点,尝试一下:
df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
HomeTeam Date Score
Wolves 2019-08-21 96
Wolves 2019-08-21 97
Wolves 2019-08-20 3
Wolves 2019-08-20 1
Wolves 2019-08-19 4")
df$Date <- as.Date(df$Date)
df %>%
mutate(lagDate = lag(Date)) %>%
group_by(HomeTeam, Date) %>%
summarize(lagDate = lagDate[1], Score = Score[1]) %>%
ungroup() %>%
select(HomeTeam, Date = lagDate, Score) %>%
right_join(select(df, -Score), by = c("HomeTeam", "Date"))
# # A tibble: 5 x 3
# HomeTeam Date Score
# <chr> <date> <int>
# 1 Wolves 2019-08-21 3
# 2 Wolves 2019-08-21 3
# 3 Wolves 2019-08-20 4
# 4 Wolves 2019-08-20 4
# 5 Wolves 2019-08-19 NA