我有一个像这样的数据框,
ID <- c("ID001","ID001","ID001","ID002","ID002","ID002")
ToolID <- c("SWP","SWP","SWP","ISP","ISP","ISP")
Type <- c("A","B","C","D","E","A")
WHEN <- c("2017-08-15 12:44:11","2017-08-15 12:44:11","2017-08-14 19:07:11",
"2017-08-17 11:24:15","2017-08-17 11:24:15","2017-08-17 11:24:15")
df <- data.frame(ID,ToolID,Type,WHEN)
df$WHEN <- as.POSIXct(df$WHEN,format="%Y-%m-%d %H:%M:%S")
我正在尝试将所有类型放在一个以逗号分隔的列中,并计算ID分组的计数(Tool_ID&amp; ID),同时仅取MAX(WHEN),即相应ID的最近时间戳。
所需的输出是
ID ToolID Type Type_count WHEN
ID001 SWP A,B 2 2017-08-15 12:44:11
ID002 ISP D,E,A 3 2017-08-17 11:24:15
我尝试使用data.table并以此方式执行
library(data.table)
setDT(df)[, WHEN := as.POSIXct(WHEN)]
df1 <- df[, max(WHEN), by = list(ID,ToolID)]
colnames(df1 )[which(names(df1 ) == "V1")] <- "WHEN"
如何获取添加到df1的类型和类型计数以获得所需的输出? 有人能指出我正确的方向吗?
答案 0 :(得分:1)
我们可以根据逻辑条件创建一个rowindex,然后使用group by,在i
中指定索引并获取摘要
i1 <- setDT(df)[, .I[WHEN == max(WHEN)], .(ID, ToolID)]$V1
df[i1, .(Type = toString(unique(Type)), Type_count = uniqueN(Type),
WHEN = WHEN[1]), .(ID, ToolID)]
# ID ToolID Type Type_count WHEN
#1: ID001 SWP A, B 2 2017-08-15 12:44:11
#2: ID002 ISP D, E, A 3 2017-08-17 11:24:15