R变异以计算组内的相对发生频率

时间:2019-07-02 03:04:21

标签: r tidyverse

我需要一些帮助。

假设我有这个:

# A tibble: 10 x 3
   a         b c    
   <chr> <dbl> <lgl>
 1 a         1 TRUE 
 2 a         1 TRUE 
 3 a         1 TRUE 
 4 a         2 TRUE 
 5 a         2 TRUE 
 6 a         2 FALSE
 7 a         2 FALSE
 8 a         3 FALSE
 9 a         3 FALSE
10 a         3 FALSE

structure(list(a = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a"), b = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3), c = c(TRUE, TRUE, 
TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

我要group_byb并在每个组内计算T == TRUE列中c的相对频率,以生成d列。

所以我想要这个输出:

# A tibble: 10 x 4
   a         b c         d
   <chr> <dbl> <lgl> <dbl>
 1 a         1 TRUE    1  
 2 a         1 TRUE    1  
 3 a         1 TRUE    1  
 4 a         2 TRUE    0.5
 5 a         2 TRUE    0.5
 6 a         2 FALSE   0.5
 7 a         2 FALSE   0.5
 8 a         3 FALSE   0  
 9 a         3 FALSE   0  
10 a         3 FALSE   0  

structure(list(a = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a"), b = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3), c = c(TRUE, TRUE, 
TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE), d = c(1, 
1, 1, 0.5, 0.5, 0.5, 0.5, 0, 0, 0)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

首选dplyrtidyverse

我尝试过:

#1
t %>% 
  group_by(b) %>%
  mutate(
    d = nrow(c[c == T])/nrow()
  )
#2
t %>% 
  group_by(b) %>%
  mutate(
    d = count(c[c == T])/count()
  )
#3 
t %>% 
  group_by(b) %>%
  mutate(
    d = nrow(any(c[c == T]))/nrow(any())
  )

没有人工作。

相似的问题(但不同):
How to calculate the relative frequency per groups
R: relative frequency in r by factor

感谢您的帮助。
提前致谢。

2 个答案:

答案 0 :(得分:2)

通常可以找到一个变量在我们组中出现的次数

df %>%
  group_by(b) %>%
  mutate(d = sum(c == TRUE)/n())

但是由于c是逻辑向量,我们也可以取sum中的c,然后除以组中的行数。

library(dplyr)

df %>%
  group_by(b) %>%
  mutate(d = sum(c)/n())

#   a         b c         d
#   <chr> <dbl> <lgl> <dbl>
# 1 a         1 TRUE    1  
# 2 a         1 TRUE    1  
# 3 a         1 TRUE    1  
# 4 a         2 TRUE    0.5
# 5 a         2 TRUE    0.5
# 6 a         2 FALSE   0.5
# 7 a         2 FALSE   0.5
# 8 a         3 FALSE   0  
# 9 a         3 FALSE   0  
#10 a         3 FALSE   0  

答案 1 :(得分:1)

按'b'分组后,我们只需取'c'的mean

library(dplyr)
df1 %>%
    group_by(b) %>% 
    mutate(d = mean(c))
# A tibble: 10 x 4
# Groups:   b [3]
#   a         b c         d
#   <chr> <dbl> <lgl> <dbl>
# 1 a         1 TRUE    1  
# 2 a         1 TRUE    1  
# 3 a         1 TRUE    1  
# 4 a         2 TRUE    0.5
# 5 a         2 TRUE    0.5
# 6 a         2 FALSE   0.5
# 7 a         2 FALSE   0.5
# 8 a         3 FALSE   0  
# 9 a         3 FALSE   0  
#10 a         3 FALSE   0  

注意:mean-定义-您习惯的“平均值”,将所有数字相加然后除以数字数。


另一个选择是

df1 %>%
   group_by(b) %>%
   mutate(d = sum(as.integer(c))/n())

或使用data.table

library(data.table)
setDT(df1)[, d := mean(c), by = b]

或使用base R

df1$d <- with(df1, ave(c, b))