Question

此问题基于以下帖子并附加了其他要求（Iterate through columns in dplyr?）。

原始代码如下：

df <- data.frame(col1 = rep(1, 15),
      col2 = rep(2, 15),
      col3 = rep(3, 15),
      group = c(rep("A", 5), rep("B", 5), rep("C", 5)))

for(col in c("col1", "col2", "col3")){
      filt.df <- df %>%
      filter(group == "A") %>% 
      select_(.dots = c('group', col))
      # do other things, like ggplotting
      print(filt.df)
}

我的目标是通过GROUP组合输出每个唯一COL的频率表。当前示例基于GROUP值A，B或C指定dplyr过滤器。在我的情况下，我想迭代（循环）GROUP中的值列表（list＆lt; -c（“A”，“B） “，”C“）并为每个组合生成一个频率表。

频率表基于计数。对于Col1，结果看起来类似于下表。简化了示例数据集。我的真实数据集更加复杂，每个“组”有多个“值”。我需要按组迭代Col1-Col3。

组值n prop A 1 5 .1
B 2 5 .1
C 3 5 .1

频率表的一个更好的例子是：How to use dplyr to generate a frequency table

我在这里挣扎了几天，我的榜样可以做得更好。谢谢你的帖子。以下是我最终要解决的问题。结果是每列的一系列频率表和组中的每个唯一值。我有3列（col1，col2，col3）和3组（A，B，C），3x3中的唯一值。结果是9个频率表和每组值非频率的频率表。我相信有更好的方法可以做到这一点。输出会生成一些标签，这很有用。

# Build unique group list
group <- unique(df$group)

# Generate frequency tables via a loop
iterate_by_group <- function(x)
 for (i in 1:length(group)){ 
  filt.df <- df[df$group==group[i],]
  print(lapply(filt.df, freq))
}

# Run 
iterate_by_group(df)

Answer 1

这是你想要的吗？

df %>%
  group_by(group) %>%
  summarise_all(funs(freq = sum))

Answer 2

我们可以gather为长格式，然后按组获取频率（n()）

library(tidyverse)
gather(df, value, val, col1:col3) %>%
        group_by(group, value = parse_number(value)) %>% 
        summarise(n = n(), prop = n/nrow(.))
# A tibble: 9 x 4
# Groups:   group [?]
#  group value     n  prop
#  <fct> <dbl> <int> <dbl>
#1 A         1     5 0.111
#2 A         2     5 0.111
#3 A         3     5 0.111
#4 B         1     5 0.111
#5 B         2     5 0.111
#6 B         3     5 0.111
#7 C         1     5 0.111
#8 C         2     5 0.111
#9 C         3     5 0.111

迭代R dplyr

2 个答案: