根据两个数据框中

时间:2016-07-12 06:20:07

标签: r

我有两个数据帧(df1和df2),其示例如下:

df1 <- data.frame(StationID = c(1,1,1,2,2,3,3,3,3,3),
              Cameras       = c("Cam1","Cam2","Cam2","Cam1","Cam1","Cam2","Cam1","Cam2","Cam1","Cam1"),
              Start         = c("2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23","2013-04-23"),
              End           = c("2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25","2013-04-25"))


df2 <- data.frame(StationID = c(1,1,2,2,3,3),
                  Cameras   = c("Cam1","Cam2","Cam1","Cam2","Cam1","Cam2"))

我想生成一个新的数据帧(df3),它会查找两列(StationID和Cameras)之间的匹配,然后将“Start”和“End”日期列附加到相应的匹配项。代码需要根据数据动态添加新列,因为某些实例没有匹配项,而其他实例将有很多匹配项。

以下示例输出:

  StationID Cameras     Start1       End1     Start2       End2     Start3       End3
1         1    Cam1 2013-04-23 2013-04-25       <NA>       <NA>       <NA>       <NA>
2         1    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
3         2    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>
4         2    Cam2       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>
5         3    Cam1 2013-04-23 2013-04-25 2013-04-23 2013-04-25 2013-04-23 2013-04-25
6         3    Cam2 2013-04-23 2013-04-25 2013-04-23 2013-04-25       <NA>       <NA>

我很感激这项任务的任何帮助。

提前致谢!

2 个答案:

答案 0 :(得分:2)

我们加入了两个数据集on'StationID'和'相机',并使用dcast中的data.table,这可以将多个value.var列重新整形为'宽'格式

 library(data.table)#1.9.7+
 dcast(setDT(df1)[df2, on = c("StationID", "Cameras")], 
     StationID + Cameras ~rowid(StationID, Cameras), value.var = c("Start", "End"))
 # StationID Cameras    Start_1    Start_2    Start_3      End_1      End_2      End_3
 #1:         1    Cam1 2013-04-23         NA         NA 2013-04-25         NA         NA
 #2:         1    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
 #3:         2    Cam1 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA
 #4:         2    Cam2         NA         NA         NA         NA         NA         NA
 #5:         3    Cam1 2013-04-23 2013-04-23 2013-04-23 2013-04-25 2013-04-25 2013-04-25
 #6:         3    Cam2 2013-04-23 2013-04-23         NA 2013-04-25 2013-04-25         NA

注意:rowid来自data.table_1.9.7。它可以从here安装。如果我们有1.9.6或更早版本,请按

创建rowid
 dN <- setDT(df1)[df2, on = c("StationID", "Cameras")
                     ][, rid := 1:.N, .(StationID, Cameras)]

然后执行dcast

dcast(dN, StationID + Cameras ~rid, value.var = c("Start", "End"))

答案 1 :(得分:0)

也许这很有用

library(dplyr)
library(tidyr)
full_join(df1,df2) %>% group_by(StationID,Cameras) %>% summarise_each(funs(toString)) %>% separate(col = Start,into = paste("Start",1:3,sep=""),sep=", ",extra="merge") %>% separate(col = End,into = paste("End",1:3,sep=""),sep=", ",extra="merge")