计算类型计数并将逗号分隔的类型添加到data.table

时间:2017-08-30 19:41:17

标签: r data.table

我有一个像这样的数据框,

ID <- c("ID001","ID001","ID001","ID002","ID002","ID002")
ToolID <- c("SWP","SWP","SWP","ISP","ISP","ISP")
Type <- c("A","B","C","D","E","A")
WHEN <- c("2017-08-15 12:44:11","2017-08-15 12:44:11","2017-08-14 19:07:11",
          "2017-08-17 11:24:15","2017-08-17 11:24:15","2017-08-17 11:24:15")

df <- data.frame(ID,ToolID,Type,WHEN) 
df$WHEN <- as.POSIXct(df$WHEN,format="%Y-%m-%d %H:%M:%S")

我正在尝试将所有类型放在一个以逗号分隔的列中,并计算ID分组的计数(Tool_ID&amp; ID),同时仅取MAX(WHEN),即相应ID的最近时间戳。

所需的输出

     ID ToolID  Type Type_count                WHEN
  ID001    SWP   A,B          2 2017-08-15 12:44:11
  ID002    ISP D,E,A          3 2017-08-17 11:24:15

我尝试使用data.table并以此方式执行

library(data.table)
setDT(df)[, WHEN := as.POSIXct(WHEN)]
df1 <- df[, max(WHEN), by = list(ID,ToolID)]
colnames(df1 )[which(names(df1 ) == "V1")] <- "WHEN"

如何获取添加到df1的类型和类型计数以获得所需的输出? 有人能指出我正确的方向吗?

1 个答案:

答案 0 :(得分:1)

我们可以根据逻辑条件创建一个rowindex,然后使用group by,在i中指定索引并获取摘要

i1 <- setDT(df)[, .I[WHEN == max(WHEN)], .(ID, ToolID)]$V1
df[i1, .(Type = toString(unique(Type)), Type_count = uniqueN(Type),
         WHEN = WHEN[1]), .(ID, ToolID)]
#      ID ToolID    Type Type_count                WHEN
#1: ID001    SWP    A, B          2 2017-08-15 12:44:11
#2: ID002    ISP D, E, A          3 2017-08-17 11:24:15