用R中的值计算计数按组执行频率表

时间:2018-08-18 13:56:47

标签: r dplyr data.table lapply

假设这是我的数据集

(dput)
dataset<-structure(list(group1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L), .Label = c("b", "x"), class = "factor"), group2 = structure(c(2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("g", "y"), class = "factor"), 
    var1 = c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("group1", 
"group2", "var1"), class = "data.frame", row.names = c(NA, -9L
))

我需要计算两组的频率

x+y
b+g

,对于变量var1,计算1值和2值的计数。对于每个组。 所以想要的输出

        total_count_of_group    var1-1  var1-2
x   y          5                   
                              3         2
b   g          4              2         2

此输出表示total_count_of_group x + y = 5 obs。由这个小组。 其中1个值满足3次,2个值满足2次。

相似 total_count_of_group b + g = 4肥胖。由这个小组。 其中1个值满足2次,2个值满足2次。

如何获得这样的桌子?

5 个答案:

答案 0 :(得分:4)

这可以通过两个步骤解决:

  1. 汇总组总数并更新dataset
  2. 从长格式到宽格式

使用data.table

library(data.table)
dcast(setDT(dataset)[, total_count_of_group := .N, by =. (group1, group2)], 
      group1 + group2 + total_count_of_group~ paste0("var1=", var1), length)
   group1 group2 total_count_of_group var1_1 var1_2
1:      b      g                    4      2      2
2:      x      y                    5      3      2

请注意,这将适用于var1中任意数量的不同值以及任意数量的组。

答案 1 :(得分:3)

您可以生成三个表,选择相关计数,然后合并到一个数据框中。

edit

答案 2 :(得分:3)

library(tidyverse)

dataset %>%
  group_by(group1, group2) %>%             # for each combination of groups
  mutate(counts = n()) %>%                 # count number of rows
  count(group1, group2, var1, counts) %>%  # count unique combinations 
  spread(var1, n, sep = "_") %>%           # reshape dataset
  ungroup()                                # forget the grouping

# # A tibble: 2 x 5
#   group1 group2 counts var1_1 var1_2
#   <fct>  <fct>   <int>  <int>  <int>
# 1 b      g           4      2      2
# 2 x      y           5      3      2

答案 3 :(得分:1)

这里是使用add_library( common_objects OBJECT ${MY_SOURCES}) add_executable(exec1 $<TARGET_OBJECTS:common_objects> ${SOURCES_ONLY_FOR_FIRST}) # different configuration may be set add_executable(exec2 $<TARGET_OBJECTS:common_objects> ${SOURCES_ONLY_FOR_SECOND})

的选项
base R

答案 4 :(得分:1)

这是一个tidyverse解决方案:

library(tidyverse)
dataset %>%
  group_by(group1, group2) %>%
  summarize(total = n(), x = list(table(var1) %>% as_tibble %>% spread(var1,n))) %>%
  unnest

# # A tibble: 2 x 5
# # Groups:   group1 [2]
#   group1 group2 total   `1`   `2`
#   <fct>  <fct>  <int> <int> <int>
# 1 b      g          4     2     2
# 2 x      y          5     3     2