我有一个如下数据框:
data.frame(id = rep(1:2, each=4),
word = c('apple', 'pear', 'orange', 'banana',
'apple', 'watermellon','orange', 'grape') )
我需要将数据转换为以下表格。需要将每个组(id)中单词列中的所有单词配对,并创建两列word1
和word2
。
id word1 word2
1 apple pear
1 apple orange
1 apple banana
1 pear orange
1 pear banana
1 orange banana
2 apple watermellon
2 apple Orange
答案 0 :(得分:1)
我们可以按'id'进行分组,使用combn
来获取'word'的成对组合,然后unnest
进行输出
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(out = list(combn(word, 2, FUN = function(x)
tibble(word1 = x[1], word2 = x[2]), simplify = FALSE))) %>%
unnest %>%
unnest
# A tibble: 12 x 3
# id word1 word2
# <int> <fct> <fct>
# 1 1 apple pear
# 2 1 apple orange
# 3 1 apple banana
# 4 1 pear orange
# 5 1 pear banana
# 6 1 orange banana
# 7 2 apple watermellon
# 8 2 apple orange
# 9 2 apple grape
#10 2 watermellon orange
#11 2 watermellon grape
#12 2 orange grape
或与data.table
library(data.table)
setDT(df1)[, as.data.frame(do.call(rbind, combn(as.character(word),
2, simplify = FALSE))), by = id]
注意:这种用法combn
仅在选择所需的组合而没有任何联接的情况下有效
df1 <- data.frame(id = rep(1:2, each=4),
word = c('apple', 'pear', 'orange', 'banana',
'apple', 'watermellon','orange', 'grape') )
答案 1 :(得分:1)
这是一个dplyr
解决方案,它可以将数据框自身连接起来并删除不需要的对-
df %>%
inner_join(df, by = "id") %>%
filter(
word.x != word.y &
!duplicated(t(apply(., 1, sort)))
) %>%
rename(word1 = word.x, word2 = word.y)
id word1 word2
1 1 apple pear
2 1 apple orange
3 1 apple banana
4 1 pear orange
5 1 pear banana
6 1 orange banana
7 2 apple watermellon
8 2 apple orange
9 2 apple grape
10 2 watermellon orange
11 2 watermellon grape
12 2 orange grape