Question

我已经对R中的数据框进行了分组和汇总，因此我现在有一个表格如下：

Group | Value | Count
==========================
   A  |   1   |   4
   A  |   2   |   2
   A  |   10  |   4
   B  |   3   |   2
   B  |   4   |   4
   B  |   2   |   3
   C  |   5   |   3
   C  |   2   |   6

我有兴趣找出每组中值2的相对频率：

Group | Relative freq of 2
==========================
   A  |  2/(4+2+4) = 0.2
   B  |  3/(2+4+3) = 0.33
   C  |  6/(3+6) = 0.67

除了用循环和条件语写一堆代码之外，有没有一种简单，优雅的方法在R中计算它？可能使用dplyr。

Answer 1

使用dplyr，在按'Group'分组后，我们将'Count'子集化为'Value'，其中'Value'为2（假设每个'Group'只有一个'Value'为2）并除以'Count'的sum

library(dplyr)
df1 %>%
   group_by(Group) %>% 
   summarise(RelFreq = round(Count[Value==2]/sum(Count), 2))
#  Group RelFreq
#  <chr>   <dbl>
#1     A    0.20
#2     B    0.33
#3     C    0.67

相应的data.table选项是

library(data.table)
setDT(df1)[, .(RelFreq = round(Count[Value == 2]/sum(Count),2)), by = Group]

Answer 2

以下是基础R解决方案：

sapply(split(df1, df1$Group), 
   function(x) round(sum(x$Count[x$Value == 2]) / sum(x$Count), 2))

##  A    B    C 
## 0.20 0.33 0.67

Answer 3

您可以使用相同的逻辑使用for循环

[[('Z', '6'), ('Z', '6'), ('C', '35'), ('D', '25'), ('E', '10'), ('Z', '0'), ('Z', '0')], [('Z', '7'), ('Z', '7'), ('B
', '28'), ('D', '29'), ('Z', '2'), ('Z', '0'), ('Z', '0')]]

Answer 4

这个与sqldf：

library(sqldf)
df1 <- sqldf('select `Group`,`Count` from df where Value=2')
df2 <- sqldf('select `Group`, sum(`Count`) as `Count` from df group by `Group`')
df1$Count <- df1$Count / df2$Count
df1
Group     Count
1     A 0.2000000
2     B 0.3333333
3     C 0.6666667

R中的分组计算

4 个答案: