我有一个包含每周数据的数据框。每个部分有大约104周的数据,总共有83个部分。
我有第二个数据框,其中包含按部分的开始和结束周,我想过滤主数据框。
在两个表中,周是年和周的组合,例如201501,总是从第1周到第52周。
因此,在下面的示例中,我希望在201401年到201404年之间按照2015年到201603周的B部分过滤A部分。
我最初认为我可以在Weeks_Filter数据框中添加一个额外的列,该数据框是每个部分的周开始和结束的序号(每周重复每行),然后合并2个表并保持来自Weeks_Filter表的所有数据(all.y = TRUE)因为这对我做过的小样本起作用,但我不知道如何添加连续周,因为它们可以跨越不同年份。
Week <- c("201401","201402","201403","201404","201405", "201451", "201552", "201601", "201602", "201603")
Section <- c(rep("A",5),rep("B",5))
df <- data.frame(cbind(Week, Section))
Section <- c("A", "B")
Start <- c("201401","201551")
End <- c("201404","201603")
Weeks_Filter <- data.frame(cbind(Section, Start, End))
答案 0 :(得分:4)
data.table
的最新development version添加了非等联接(在较旧版本中,您可以使用foverlaps
):
setDT(df) # convert to data.table in place
setDT(Weeks_Filter)
# fix the column types - you have factors currently, converting to integer
df[, Week := as.integer(as.character(Week))]
Weeks_Filter[, `:=`(Start = as.integer(as.character(Start)),
End = as.integer(as.character(End)))]
# the actual magic
df[df[Weeks_Filter, on = .(Section, Week >= Start, Week <= End), which = T]]
# Week Section
#1: 201401 A
#2: 201402 A
#3: 201403 A
#4: 201404 A
#5: 201552 B
#6: 201601 B
#7: 201602 B
#8: 201603 B
答案 1 :(得分:1)
使用dplyr
即可
一个问题是你的'周'是字符,并且成为你编码它们的因素。我选择了快捷方式并将它们设为数字,但我建议使用lubridate
来生成这些正确的Date类向量。
library(dplyr)
tempdf <- full_join(df, Weeks_Filter)
tempdf$Week <- as.numeric(as.character(tempdf$Week))
tempdf$Start <- as.numeric(as.character(tempdf$Start))
tempdf$End <- as.numeric(as.character(tempdf$End))
tempdf_filt <- tempdf %>%
group_by(Section) %>%
filter(Week >= Start,
Week <= End)
您的数据中似乎存在“201451”应为“201551”的问题,但否则会返回您想要的内容:
> tempdf_filt
Source: local data frame [8 x 4]
Groups: Section [2]
Week Section Start End
(dbl) (fctr) (dbl) (dbl)
1 201401 A 201401 201404
2 201402 A 201401 201404
3 201403 A 201401 201404
4 201404 A 201401 201404
5 201552 B 201551 201603
6 201601 B 201551 201603
7 201602 B 201551 201603
8 201603 B 201551 201603
答案 2 :(得分:0)
创建所有所需周数的向量可能适用于过滤器。以下是使用基数R的粗略示例:
# get weeks
allWeeks <- as.character(1:52)
allWeeks <- ifelse(nchar(allWeeks)==1, paste0("0",allWeeks), allWeeks)
# get all year-weeks
allWeeks <- paste0(2014:2015, allWeeks)
# filter vector to select desired weeks
keepWeeks <- keepWeeks[grep("201(40[1-4]|55[12]|60[123]))", allWeeks)]
dfKeeper <- df[df$Week %in% keepWeeks,]
我尝试构建一个可以捕获所需周期的正则表达式,但您可能需要稍微调整一下。
答案 3 :(得分:-2)
require(data.table)
df <- merge(df, Weeks_Filter)
df[, -1] <- apply(df[, -1], 2, function(x) as.numeric(as.character(x)))
df <- data.table(df)
df[Week >= Start & Week <= End, .SD, by = Section]
输出是,
Section Start End Week
1: A 201401 201404 201401
2: A 201401 201404 201402
3: A 201401 201404 201403
4: A 201401 201404 201404
5: B 201551 201603 201552
6: B 201551 201603 201601
7: B 201551 201603 201602
8: B 201551 201603 201603