我想用item
计算group
的所有成对组合的频率。
have <- data.frame(group=c("a", "a", "a",
"b", "b",
"c",
"d", "d",
"e", "e",
"f", "f", "f"),
item=c("apple", "banana", "black cherry",
"apple", "black cherry",
"orange",
"banana", "black cherry",
"banana", "black cherry",
"apple", "banana", "black cherry"))
have
# group item
# 1 a apple
# 2 a banana
# 3 a black cherry
# 4 b apple
# 5 b black cherry
# 6 c orange
# 7 d banana
# 8 d black cherry
# 9 e banana
# 10 e black cherry
# 11 f apple
# 12 f banana
# 13 f black cherry
# almost what I want...
# cons: repeats pairs and does not include zeros
have %>%
# https://stackoverflow.com/a/38335011/841405
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))
# item.x item.y `length(unique(group))` item
# 1 apple banana 2 apple, banana
# 2 apple black cherry 3 apple, black cherry
# 3 banana apple 2 banana, apple
# 4 banana black cherry 4 banana, black cherry
# 5 black cherry apple 3 black cherry, apple
# 6 black cherry banana 4 black cherry, banana
# want I really want
# item.x item.y `length(unique(group))` item
# 1 apple banana 2 apple, banana
# 2 apple black cherry 3 apple, black cherry
# 3 apple orange 0 apple, orange
# 4 banana black cherry 4 banana, black cherry
# 5 banana orange 0 banana, orange
# 6 black cherry orange 0 black cherry, orange
答案 0 :(得分:3)
我这样做的方法是使用expand.grid
进行每种组合,然后加入已经完成的组合,然后用零填充不匹配的行。我也将您的计数重命名为n。
have2 = have %>%
full_join(have, by="group") %>%
group_by(item.x, item.y) %>%
summarise(n = length(unique(group))) %>%
filter(item.x!=item.y) %>%
mutate(item = paste(item.x, item.y, sep=", "))
combos = expand.grid(item.x = unique(have$item),
item.y = unique(have$item)) %>%
filter(as.numeric(item.x) < as.numeric(item.y)) %>%
mutate(item = paste(item.x, item.y, sep = ', ')) %>%
arrange(item.x, item.y) %>%
left_join(have2) %>%
mutate(n = replace(n, is.na(n), 0))