我正在研究一个大型数据集,其中包含一周内的旅行行为数据。在一周的时间里,人们已经完成了他们在那一周内进行的个人旅行的记录。个人通过唯一的识别号码(ID)识别。我想要做的是从每个唯一ID可用的每周数据中选择两天的日记数据(可能包含一次或多次旅行),并将其放入新的数据框中。下面详细介绍了一个示例数据框架:
Df1 <- data.frame(ID = c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3),
date = c("1st Nov", "1st Nov", "3rd Nov", "4th Nov","4th Nov","5th Nov","2nd Nov", "2nd Nov", "3nd Nov", "4th Nov","5th Nov","5th Nov","2nd Nov", "2nd Nov", "3nd Nov", "4th Nov","5th Nov"))
感谢上述任何帮助。
非常感谢,
凯蒂
答案 0 :(得分:8)
听起来像普莱尔的工作。为每个用户抽样两个随机日:
library(plyr)
ddply(Df1, .(ID), function(x) {
unique_days = as.character(unique(x$date))
if(length(unique_days) < 2) {
randomSelDays = unique_days
} else {
randomSelDays = sample(unique_days, 2)
}
return(x[x$date %in% randomSelDays,])
})
这将返回每个唯一标识符的两个选定日期的所有数据。此外,如果ID只有一天,则返回该日期。例如:
ID date
1 1 1st Nov
2 1 1st Nov
3 1 3rd Nov
4 2 3nd Nov
5 2 5th Nov
6 2 5th Nov
7 3 2nd Nov
8 3 2nd Nov
9 3 3nd Nov