使用ddply汇总应用函数以查找频率百分比

时间:2017-03-31 18:10:20

标签: r statistics plyr summarize

我有一张与此类似的表格:

<class>
com.someClass
</class>

我想要一个与苹果(参考组)相关的频率结果表,如下所示:

id   name    gender     age      count
1    apple    Male      13-20      25
1    apple    Male      21-40      30
1    apple    Female    13-20      60
1    apple    Female    21-40      42
2    banana   Male      13-20      45
2    banana   Male      21-40      12
2    banana   Female    13-20      22
2    banana   Female    21-40      74
3    orange   Male      13-20      52
3    orange   Male      21-40      25
3    orange   Female    13-20      30
3    orange   Female    21-40      48

如何使用id gender banana_wrt_apple orange_wrt_apple 1 Male 57/55 77/55 2 Female 96/102 78/102 ddply

执行此操作

1 个答案:

答案 0 :(得分:0)

转到dplyr。它是plyr的“替代品”。

您可以使用summarise两次,但这需要大量的黑客攻击。说实话,这是一种糟糕的数据格式。

df %>% 
  group_by(gender, name) %>%
  summarise(tot_count = sum(count)) %>%
  group_by(gender) %>%
  do(data.frame(banana_wrt_apple = paste0(.$tot_count[2], "/", .$tot_count[1]),
                orange_wrt_apple = paste0(.$tot_count[3], "/", .$tot_count[1])))

  gender banana_wrt_apple orange_wrt_apple
  <fctr>            <chr>            <chr>
1 Female           96/102           78/102
2   Male            57/55            77/55

数据

df = structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L), .Label = c("apple", "banana", "orange"), class = "factor"), 
    gender = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 
    2L, 1L, 1L), .Label = c("Female", "Male"), class = "factor"), 
    age = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L), .Label = c("13-20", "21-40"), class = "factor"), 
    count = c(25L, 30L, 60L, 42L, 45L, 12L, 22L, 74L, 52L, 25L, 
    30L, 48L)), .Names = c("id", "name", "gender", "age", "count"
), class = "data.frame", row.names = c(NA, -12L))