根据累积频率折叠dplyr tibble行

时间:2017-02-02 16:46:45

标签: r dplyr

SP.FieldUserValue.fromUser()

使用dplyr按计数创建类别和顺序的计数。

#Generate some data
set.seed(1234)
rows = 100
created_data <- data.frame(index = 1:rows,
                           catsA = sample((letters[1:5]),rows,replace=T),
                           valueA = round(rnorm(rows),3))

输出

library(dplyr)

count_of_cat <- created_data %>% 
  group_by(catsA) %>%
  summarise(rowcount = n()) %>%
  ungroup %>%
  arrange(-rowcount) %>%
  mutate(rel.freq = round(rowcount/sum(rowcount),3)) %>%
  mutate(cum.freq = cumsum(rel.freq))

在说出cum.freq&gt;之后是否有一个很好的方法来汇总行0.50

期望的输出

 catsA rowcount rel.freq cum.freq
1     b       26     0.26     0.26
2     a       25     0.25     0.51
3     c       17     0.17     0.68
4     d       17     0.17     0.85
5     e       15     0.15     1.00

1 个答案:

答案 0 :(得分:1)

从这里开始dplyr mutate rowSums calculations or custom functions

count_of_cat %>% filter(cum.freq <= 0.51) %>%
  rbind(
    count_of_cat %>% filter(cum.freq > 0.51) %>%
  summarise(catsA = "new", 
            rowcount = sum(rowcount),
            rel.freq = sum(rel.freq),
            cum.freq = 1.00)
  )