这是问题here的扩展。
我有一个像这样的数据框:
df<-structure(list(person = c("p1", "p1", "p1", "p1", "p1", "p1",
"p1", "p2", "p2", "p2", "p3", "p3", "p3", "p4", "p4", "p4", "p5",
"p5", "p5", "p6", "p6", "p6", "p7", "p7", "p7"), hp_char = c("hp1",
"hp2", "hp3", "hp4", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10",
"hp1", "hp2", "hp3", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10",
"hp3", "hp4", "hp5", "hp1", "hp2", "hp3")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("person",
"hp_char"), spec = structure(list(cols = structure(list(person = structure(list(), class = c("collector_character",
"collector")), hp_char = structure(list(), class = c("collector_character",
"collector"))), .Names = c("person", "hp_char")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
按照Uwe提供的非常有效的self-join / data.table答案,我得到两个“ hp_id”的同时出现的实例数,如下所示:
df_by2<- setDT(df)[df, on = "person", allow = TRUE][
hp_char < i.hp_char, .N, by = .(HP_ID1 = hp_char, HP_ID2 = i.hp_char)]
这给了我
HP_ID1 HP_ID2 N
1: hp1 hp2 3
2: hp1 hp3 3
3: hp2 hp3 3
4: hp1 hp4 1
5: hp2 hp4 1
6: hp3 hp4 2
7: hp1 hp5 1
8: hp2 hp5 1
9: hp3 hp5 2
10: hp4 hp5 2
11: hp1 hp6 1
12: hp2 hp6 1
13: hp3 hp6 1
14: hp4 hp6 1
15: hp5 hp6 2
16: hp1 hp7 1
17: hp2 hp7 1
18: hp3 hp7 1
19: hp4 hp7 1
20: hp5 hp7 2
21: hp6 hp7 2
22: hp10 hp8 2
23: hp8 hp9 2
24: hp10 hp9 2
但是我想知道是否可以在哪里扩展此方法 可以计算大于两个“ hp_char”的共现实例数。换句话说,我正在寻找输出(例如,发生3个事件的次数),如下所示:
HP_ID1 HP_ID2 HP_ID3 N
1 hp1 hp2 hp3 3
2 hp3 hp4 hp5 2
3 hp5 hp6 hp7 2
4 hp8 hp9 hp10 2
到目前为止,我已经能够找到两个事件同时发生的多个解决方案,但是它们似乎不能普遍用于计数> 2个事件的实例。谢谢你的帮助!
答案 0 :(得分:2)
如果使用组合方法,它可能更清洁:
library(data.table)
setDT(df)
nset <- 3
cols <- paste0("hp_char", seq_len(nset))
#create combinations of nset number of skills
combi <- do.call(CJ, rep(df[,.(unique(hp_char))], nset))
setnames(combi, cols)
#create for each person the combinations of nset number of skills
nsetSkills <- df[, do.call(CJ, rep(.(hp_char), nset)), by=.(person)]
setnames(nsetSkills, names(nsetSkills)[-1L], cols)
#join the above 2 sets and calculate the occurrence for each row in combi
ans <- nsetSkills[combi, on=cols, .N, by=.EACHI]
ans
输出:
hp_char1 hp_char2 hp_char3 N
1: hp1 hp1 hp1 3
2: hp1 hp1 hp10 0
3: hp1 hp1 hp2 3
4: hp1 hp1 hp3 3
5: hp1 hp1 hp4 1
---
996: hp9 hp9 hp5 0
997: hp9 hp9 hp6 0
998: hp9 hp9 hp7 0
999: hp9 hp9 hp8 2
1000: hp9 hp9 hp9 2
答案 1 :(得分:1)
您可以进行 double 自联接,其余部分几乎相同:
df2 <- setDT(df)[df, on = "person", allow = TRUE][df,
on = "person", allow = TRUE]
df2[hp_char < i.hp_char & i.hp_char < i.hp_char.1,
.N, by = .(HP_ID1 = hp_char,
HP_ID2 = i.hp_char,
HP_ID3 = i.hp_char.1)][N >= 2]
# HP_ID1 HP_ID2 HP_ID3 N
#1: hp1 hp2 hp3 3
#2: hp3 hp4 hp5 2
#3: hp5 hp6 hp7 2
#4: hp10 hp8 hp9 2