我不确定这个问题甚至被称为。假设我计算了2列的不同组合,但我希望两列的 order 区别开来。这就是我的意思:
df = data.frame(fruit1 = c("apple", "orange", "orange", "banana", "kiwi"),
fruit2 = c("orange", "apple", "banana", "orange", "apple"),
stringsAsFactors = FALSE)
# What I want: total number of fruit combinations, regardless of
# which fruit comes first and which second.
# Eg 2 apple-orange, 2 banana-orange, 1 kiwi-apple
# What I know *doesn't* work:
table(df$fruit1, df$fruit2)
# What *does* work:
library(dplyr)
df %>% group_by(fruit1, fruit2) %>%
transmute(fruitA = sort(c(fruit1, fruit2))[1],
fruitB = sort(c(fruit1, fruit2))[2]) %>%
group_by(fruitA, fruitB) %>%
summarise(combinations = n())
正如你所看到的,我有办法让这项工作成功,但这个一般问题是否有名称?这是一种组合问题,但是计算,而不是生成组合。如果我有三列或四列相似类型怎么办?上述方法很难推广。 Tidyverse最受欢迎!
答案 0 :(得分:3)
使用apply
和sort
订购您的数据框,我们只使用group_by
count
data.frame(t(apply(df,1,sort)))%>%group_by_all(.)%>%count()
# A tibble: 3 x 3
# Groups: X1, X2 [3]
X1 X2 n
<fctr> <fctr> <int>
1 apple kiwi 1
2 apple orange 2
3 banana orange 2
答案 1 :(得分:0)
以下是使用pmap
和count
library(tidyverse)
library(rlang)
pmap_df(df, ~ sort(c(...)) %>%
as.list %>%
as_tibble %>%
set_names(names(df))) %>%
count(!!! rlang::syms(names(.)))
# A tibble: 3 x 3
# fruit1 fruit2 n
# <chr> <chr> <int>
#1 apple kiwi 1
#2 apple orange 2
#3 banana orange 2