我得到了一个data.frame
,看起来像下面那个:
OBJECT ID TASK
1 A
1 C
1 D
1 E
2 A
2 B
2 C
2 D
2 F
现在我想计算data.frame
中唯一的连续组合,以获得以下结果:
PREDECESSOR SUCCESSOR COUNT
A C 1
C D 2
D E 1
A B 1
B C 1
D F 1
我已经想出在两个for
循环的帮助下提取连续值,但是我在新data.frame
(或list
中的分配和计数任务失败了})。
答案 0 :(得分:2)
使用data.table
的解决方案:
代码:
library(data.table)
setDT(df)
df[, TASK0 := shift(TASK), OBJECT]
df[!is.na(TASK0), .N, .(TASK, TASK0)][, .(
COUNT = sum(N)), .(PREDECESSOR = TASK0, SUCCESSOR = TASK)]
结果:
PREDECESSOR SUCCESSOR COUNT
1: A C 1
2: C D 2
3: D E 1
4: A B 1
5: B C 1
6: D F 1
说明:
setDT(df)
:将data.frame转换为data.table对象[, TASK0 := shift(TASK), OBJECT]
:获取每个OBJECT
!is.na(TASK0)
:摆脱每个OBJECT
的第一行(他们没有PREDECESSOR
).N, .(TASK, TASK0)
:计算TASK
和TASK0
(之前的字母组合)的出现次数sum(N)
:总计数数据(df
):
structure(list(OBJECT = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
TASK = c("A", "C", "D", "E", "A", "B", "C", "D", "F")), .Names = c("OBJECT",
"TASK"), row.names = c(NA, -9L), class = c("data.table", "data.frame"
))
答案 1 :(得分:2)
aggregate(COUNT~.,
data.frame(PREDECESSOR = head(df1$TASK, -1),
SUCCESSOR = tail(df1$TASK, -1),
COUNT = 1),
length)
# PREDECESSOR SUCCESSOR COUNT
#1 E A 1
#2 A B 1
#3 A C 1
#4 B C 1
#5 C D 2
#6 D E 1
#7 D F 1
您可以使用类似的方法,即使您希望split
OBJECT.ID
之前temp = do.call(rbind, lapply(split(df1, df1$OBJECT.ID), function(X){
aggregate(COUNT~., data.frame(PREDECESSOR = head(X$TASK, -1),
SUCCESSOR = tail(X$TASK, -1),
COUNT = 1),
length)
}))
aggregate(COUNT~., temp, length)
# PREDECESSOR SUCCESSOR COUNT
#1 A C 1
#2 B C 1
#3 C D 2
#4 D E 1
#5 A B 1
#6 D F 1
df1 = structure(list(OBJECT.ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), TASK = c("A", "C", "D", "E", "A", "B", "C", "D", "F")), .Names = c("OBJECT.ID",
"TASK"), class = "data.frame", row.names = c(NA, -9L))
数据强>
{{1}}
答案 2 :(得分:1)
为了获得计数,你可以用以下两行来完成:
cc <- cbind(df$TASK,c(df$TASK[-1],"LAST"))
table(paste(cc[,1],cc[2],sep="-"))
结果是
A-B A-C B-C C-D D-E D-F E-A F-LAST
1 1 1 2 1 1 1 1