Question

我有一个这样的数据框：

df <- data.frame(var1 = c("google", "yahoo", "google", "yahoo", "google"), 
                 var2 = c("price1","price1","price1","price1","price2"))

我想计算两列的成对频率。这里是预期的输出：

df_output <- data.frame(var1 = c("google","google","yahoo","yahoo"), 
                        var2 = c("price1","price2","price1","price2"), count = c(2,1,2,0))
df_output
#      var1   var2 count
# 1 google price1     2
# 2 google price2     1
# 3  yahoo price1     2
# 4  yahoo price2     0

我该怎么做？

Answer 1

Base R解决方案：

as.data.frame(table(df$var1, df$var2))
#     Var1   Var2 Freq
# 1 google price1    2
# 2  yahoo price1    2
# 3 google price2    1
# 4  yahoo price2    0

Answer 2

一种tidyverse可能是：

df %>%
 count(var1, var2) %>%
 complete(var1, nesting(var2), fill = list(n = 0))

  var1   var2       n
  <fct>  <fct>  <dbl>
1 google price1     2
2 google price2     1
3 yahoo  price1     2
4 yahoo  price2     0

在此，它按“ var1”和“ var2”计数，然后生成缺失的组合，并用0填充它们。

Answer 3

使用dcast和melt

> as.data.frame(melt(dcast(df,var1~var2)))

OR

如果您有许多列，则将名称作为向量传递-

> var_select = c("var1", "var2")
> as.data.frame(table(subset(df, select = var_select)))

   var1   var2  Freq
1 google price1    2
2  yahoo price1    2
3 google price2    1
4  yahoo price2    0

注意-第二种解决方案基于@thothal提供的table功能

计算两列对

3 个答案: