通过创建R

时间:2018-10-01 18:24:15

标签: r aggregate reshape

我需要转换此表,根据日期,方向和路线创建Cab.ID子集。

Date          Direction Cab.ID  Route
Sep 24, 2018    Logout  x-1      R1
Sep 24, 2018    Logout  x-2      R1
Sep 24, 2018    Logout  x-1      R2
Sep 24, 2018    Login   x-3      R1
Sep 25, 2018    Login   y-1      R3
Sep 25, 2018    Logout  z-1      R4
Sep 25, 2018    Logout  z-1      R4
Sep 25, 2018    Logout  x-4      R5
Sep 25, 2018    Login   x-4      R5
Sep 26, 2018    Login   x-3      R6
Sep 26, 2018    Login   x-5      R6

必需表

Date         Route    Login-Cabid   Logout-Cabid
Sep 24, 2018    R1      x-3           x-1,x-2
Sep 24, 2018    R2                    x-1
Sep 25, 2018    R3      y-1 
Sep 25, 2018    R4                    z-1
Sep 25, 2018    R5      x-4           x-4
Sep 26, 2018    R6      x-3,x-5 

谢谢

2 个答案:

答案 0 :(得分:2)

base R中,我们可以使用aggregatereshape

df2 <- aggregate(Cab.ID ~ Date + Direction + Route, unique(df1), toString)

reshape(df2, idvar = c("Date", "Route"), timevar = "Direction", direction = "wide")
#          Date Route Cab.ID.Login Cab.ID.Logout
#1 Sep 24, 2018    R1          x-3      x-1, x-2
#3 Sep 24, 2018    R2         <NA>           x-1
#4 Sep 25, 2018    R3          y-1          <NA>
#5 Sep 25, 2018    R4         <NA>           z-1
#6 Sep 25, 2018    R5          x-4           x-4
#8 Sep 26, 2018    R6     x-3, x-5          <NA>

如果要使用tidyversedata.table,请按以下步骤操作

library(dplyr)
library(tidyr)
df1 %>% 
  unique() %>% 
  group_by(Date, Route, Direction) %>% 
  summarise(Cab.ID = toString(Cab.ID)) %>% 
  spread(Direction, Cab.ID)

library(data.table)
setDT(unique(df1))[, .(Cab.ID = toString(Cab.ID)), by = .(Date, Route, Direction)
           ][, dcast(.SD, Date + Route ~ Direction, value.var = 'Cab.ID')]

数据

df1 <- structure(list(Date = c("Sep 24, 2018", "Sep 24, 2018", "Sep 24, 2018", 
"Sep 24, 2018", "Sep 25, 2018", "Sep 25, 2018", "Sep 25, 2018", 
"Sep 25, 2018", "Sep 25, 2018", "Sep 26, 2018", "Sep 26, 2018"
), Direction = c("Logout", "Logout", "Logout", "Login", "Login", 
"Logout", "Logout", "Logout", "Login", "Login", "Login"), Cab.ID = c("x-1", 
"x-2", "x-1", "x-3", "y-1", "z-1", "z-1", "x-4", "x-4", "x-3", 
"x-5"), Route = c("R1", "R1", "R2", "R1", "R3", "R4", "R4", "R5", 
"R5", "R6", "R6")), .Names = c("Date", "Direction", "Cab.ID", 
"Route"), class = "data.frame", row.names = c(NA, -11L))

答案 1 :(得分:-1)

同意使用标记,您可以在spread {tidyr}之后使用df2 <- aggregate(Cab.ID ~ Date + Direction + Route, df1, toString)

spread(df2, key = Direction, value = Cab.ID)