我有两个数据帧(df1和df2),其示例如下:
df1 <- data.frame(StationID = c(1,1,1,2,2,3,3,3,3,3),
Cameras = c("Cam1","Cam2","Cam2","Cam1","Cam1","Cam2","Cam1","Cam2","Cam1","Cam1"),
Start = c("2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23"),
End = c("2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25"))
df2 <- data.frame(StationID = c(1,1,2,2,3,3),
Cameras = c("Cam1","Cam2","Cam1","Cam2","Cam1","Cam2"))
我想生成一个新的数据帧(df3),它会查找两列(StationID和Cameras)之间的匹配,然后将“Start”和“End”日期列附加到相应的匹配项。代码需要根据数据动态添加新列,因为某些实例没有匹配项,而其他实例将有很多匹配项。
以下示例输出:
StationID Cameras Start1 End1 Start2 End2 Start3 End3
1 1 Cam1 2013-04-23 2013-04-25 <NA> <NA> <NA> <NA>
2 1 Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25 <NA> <NA>
3 2 Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25 <NA> <NA>
4 2 Cam2 <NA> <NA> <NA> <NA> <NA> <NA>
5 3 Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25 2013-04-23 2013-04-25
6 3 Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25 <NA> <NA>
我很感激这项任务的任何帮助。
提前致谢!
答案 0 :(得分:2)
我们加入了两个数据集on
'StationID'和'相机',并使用dcast
中的data.table
,这可以将多个value.var
列重新整形为'宽'格式
library(data.table)#1.9.7+
dcast(setDT(df1)[df2, on = c("StationID", "Cameras")],
StationID + Cameras ~rowid(StationID, Cameras), value.var = c("Start", "End"))
# StationID Cameras Start_1 Start_2 Start_3 End_1 End_2 End_3
#1: 1 Cam1 2013-04-23 NA NA 2013-04-25 NA NA
#2: 1 Cam2 2013-04-23 2013-04-23 NA 2013-04-25 2013-04-25 NA
#3: 2 Cam1 2013-04-23 2013-04-23 NA 2013-04-25 2013-04-25 NA
#4: 2 Cam2 NA NA NA NA NA NA
#5: 3 Cam1 2013-04-23 2013-04-23 2013-04-23 2013-04-25 2013-04-25 2013-04-25
#6: 3 Cam2 2013-04-23 2013-04-23 NA 2013-04-25 2013-04-25 NA
注意:rowid
来自data.table_1.9.7。它可以从here安装。如果我们有1.9.6或更早版本,请按
dN <- setDT(df1)[df2, on = c("StationID", "Cameras")
][, rid := 1:.N, .(StationID, Cameras)]
然后执行dcast
dcast(dN, StationID + Cameras ~rid, value.var = c("Start", "End"))
答案 1 :(得分:0)
也许这很有用
library(dplyr)
library(tidyr)
full_join(df1,df2) %>% group_by(StationID,Cameras) %>% summarise_each(funs(toString)) %>% separate(col = Start,into = paste("Start",1:3,sep=""),sep=", ",extra="merge") %>% separate(col = End,into = paste("End",1:3,sep=""),sep=", ",extra="merge")