我的df1按日期排序如下:
Date <- c("12/17/17","12/19/17","12/20/17","12/30/17","12/31/17","1/1/18")
Jon <- c(388,299,412,NA,NA,353)
Eric <- c(121,NA,321,473,832,NA)
Scott <- c(NA,122,NA,NA,NA,424)
df1 <- data.frame(Date,Jon,Eric,Scott)
df1$Date <- as.Date(df1$Date,format='%m/%d/%y')
#df1
Date Jon Eric Scott
1 12/17/17 388 121 NA
2 12/19/17 299 NA 122
3 12/20/17 412 321 NA
4 12/30/17 NA 473 NA
5 12/31/17 NA 832 NA
6 1/1/18 353 NA 424
我尝试创建一个新列表,其中仅包含每个人最近12天内的非NA值的数据。如果在该人最近的非NA值的12天内只有一个非NA值,那么我想为该人获取2个最近的非NA值,即使其中一个超出该值12天的日期范围。
以下代码成功地将每个人最近非NA值的最近12天内的数据放入新列表中:
df2 <- lapply(df1[-1],function(x) x[which((m=tail(df1$Date[!is.na(x)],1)-df1$Date)>=0&m<=12)])
此代码成功获取2个最近的非NA条目,无论它是否在12天范围内:
df3 <- lapply(df1[-1], function(x) tail(x[!is.na(x)], n = 2))
这段代码非常接近于我想要它做的事情,除了它丢失了列名。请注意,列名称将替换为数字,与上面的lapply语句不同,后者都保留列名称。
withinRange <-lapply(df1[-1],function(x)x[which((m=tail(df1$Date[!is.na(x)],1)-df1$Date)>=0&m<=12)]) %>%
lapply(function(x)length(x[!is.na(x)])) %>%
as.data.frame()
df4 <- ifelse(withinRange[colnames(df1[-1])]>1,lapply(df1[-1],function(x) x[which((m=tail(df1$Date[!is.na(x)],1)-df1$Date)>=0&m<=12)]),lapply(df1[-1], function(x) tail(x[!is.na(x)], n = 2)))
如何维护列名?
答案 0 :(得分:1)
我会使用tidyverse
包来解决这个问题。
library(tidyr)
library(dplyr)
library(lubridate)
df <- tibble(
my_date = as.Date(
c("12/17/17", "12/19/17", "12/20/17", "12/30/17", "12/31/17", "1/1/18"),
"%m/%d/%y"
),
jon = c(388, 299, 412, NA, NA, 353),
eric = c(121, NA, 321, 473, 832, NA),
scott = c(NA, 122, NA, NA, NA, 424)
)
这种输出感觉更自然。
df_long <- df %>%
gather(key, value, -my_date) %>%
drop_na %>%
group_by(key) %>%
mutate(
in_date = if_else(my_date >= max(my_date) - days(12), TRUE, FALSE),
count = sum(in_date)
) %>%
filter(in_date | count < 2) %>%
top_n(2, my_date) %>%
ungroup %>%
select(-c(in_date, count))
df_long
# # A tibble: 6 x 3
# my_date key value
# <date> <chr> <dbl>
# 1 2017-12-20 jon 412
# 2 2018-01-01 jon 353
# 3 2017-12-30 eric 473
# 4 2017-12-31 eric 832
# 5 2017-12-19 scott 122
# 6 2018-01-01 scott 424
值得庆幸的是,传播到原始列只需要一个额外的步骤。
df_long %>% spread(key, value)
# # A tibble: 5 x 4
# my_date eric jon scott
# * <date> <dbl> <dbl> <dbl>
# 1 2017-12-19 NA NA 122
# 2 2017-12-20 NA 412 NA
# 3 2017-12-30 473 NA NA
# 4 2017-12-31 832 NA NA
# 5 2018-01-01 NA 353 424
答案 1 :(得分:0)
似乎对我来说最简单的事情就是将列标题存储在变量中,然后重新附加它们:
myHeaders <- names(df1[-1])
withinRange <-lapply(df1[-1],function(x)x[which((m=tail(df1$Date[!is.na(x)],1)-df1$Date)>=0&m<=12)]) %>%
lapply(function(x)length(x[!is.na(x)])) %>%
as.data.frame()
df4 <- ifelse(withinRange[colnames(df1[-1])]>1,lapply(df1[-1],function(x) x[which((m=tail(df1$Date[!is.na(x)],1)-df1$Date)>=0&m<=12)]),lapply(df1[-1], function(x) tail(x[!is.na(x)], n = 2)))
names(df4) <- myHeaders