我有一张与此类似的表格:
<class>
com.someClass
</class>
我想要一个与苹果(参考组)相关的频率结果表,如下所示:
id name gender age count
1 apple Male 13-20 25
1 apple Male 21-40 30
1 apple Female 13-20 60
1 apple Female 21-40 42
2 banana Male 13-20 45
2 banana Male 21-40 12
2 banana Female 13-20 22
2 banana Female 21-40 74
3 orange Male 13-20 52
3 orange Male 21-40 25
3 orange Female 13-20 30
3 orange Female 21-40 48
如何使用id gender banana_wrt_apple orange_wrt_apple
1 Male 57/55 77/55
2 Female 96/102 78/102
和ddply
?
答案 0 :(得分:0)
转到dplyr
。它是plyr的“替代品”。
您可以使用summarise
两次,但这需要大量的黑客攻击。说实话,这是一种糟糕的数据格式。
df %>%
group_by(gender, name) %>%
summarise(tot_count = sum(count)) %>%
group_by(gender) %>%
do(data.frame(banana_wrt_apple = paste0(.$tot_count[2], "/", .$tot_count[1]),
orange_wrt_apple = paste0(.$tot_count[3], "/", .$tot_count[1])))
gender banana_wrt_apple orange_wrt_apple
<fctr> <chr> <chr>
1 Female 96/102 78/102
2 Male 57/55 77/55
数据
df = structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L), name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L), .Label = c("apple", "banana", "orange"), class = "factor"),
gender = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L), .Label = c("Female", "Male"), class = "factor"),
age = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("13-20", "21-40"), class = "factor"),
count = c(25L, 30L, 60L, 42L, 45L, 12L, 22L, 74L, 52L, 25L,
30L, 48L)), .Names = c("id", "name", "gender", "age", "count"
), class = "data.frame", row.names = c(NA, -12L))