我有一个这样的数据框:
df <- data.frame(var1 = c("google", "yahoo", "google", "yahoo", "google"),
var2 = c("price1","price1","price1","price1","price2"))
我想计算两列的成对频率。这里是预期的输出:
df_output <- data.frame(var1 = c("google","google","yahoo","yahoo"),
var2 = c("price1","price2","price1","price2"), count = c(2,1,2,0))
df_output
# var1 var2 count
# 1 google price1 2
# 2 google price2 1
# 3 yahoo price1 2
# 4 yahoo price2 0
我该怎么做?
答案 0 :(得分:4)
Base R解决方案:
as.data.frame(table(df$var1, df$var2))
# Var1 Var2 Freq
# 1 google price1 2
# 2 yahoo price1 2
# 3 google price2 1
# 4 yahoo price2 0
答案 1 :(得分:3)
一种tidyverse
可能是:
df %>%
count(var1, var2) %>%
complete(var1, nesting(var2), fill = list(n = 0))
var1 var2 n
<fct> <fct> <dbl>
1 google price1 2
2 google price2 1
3 yahoo price1 2
4 yahoo price2 0
在此,它按“ var1”和“ var2”计数,然后生成缺失的组合,并用0填充它们。
答案 2 :(得分:1)
使用dcast
和melt
> as.data.frame(melt(dcast(df,var1~var2)))
OR
如果您有许多列,则将名称作为向量传递-
> var_select = c("var1", "var2")
> as.data.frame(table(subset(df, select = var_select)))
var1 var2 Freq
1 google price1 2
2 yahoo price1 2
3 google price2 1
4 yahoo price2 0
注意-第二种解决方案基于@thothal提供的table
功能