分组然后根据条件添加比率列

时间:2015-03-10 20:50:41

标签: r aggregate

说我在R中的数据框看起来像下面那样。 性是男/女。 Familysize是姓氏相同的家庭成员人数。姓氏是姓氏。

Sex         FamilySize  Surname  
male        1           Abbing  
female      3           Abbott  
male        3           Abbott  
male        3           Abbott  
male        1           Abelseth  
female      1           Abelseth  
male        2           Abelson  
female      2           Abelson  
male        1           Abrahamsson  
female      1           Abrahim 

我想添加一个新列FemaleToFamilySizeRatio,它会给出每个家庭中女性数量的比率。结果如下所示:

Sex         FamilySize  Surname     Ratio  
male        1           Abbing      0  
female      3           Abbott      0.33  
male        3           Abbott      0.33  
male        3           Abbott      0.33  
male        1           Abelseth    0.5  
female      1           Abelseth    0.5  
male        2           Abelson     0.5  
female      2           Abelson     0.5  
male        1           Abrahamsson 0  
female      1           Abrahim     0  

我玩桌子,聚集,最有希望的是ddply。我已经达到了一个方向,一些方向会有所帮助,因为如果我继续我的代码将只会变得冗长和丑陋。

3 个答案:

答案 0 :(得分:3)

你可以使用data.table

来做到这一点
library(data.table)
table_family <- data.table(table_input)
table_family[, Ratio := sum(Sex == "female") / FamilySize[1], by = "Surname"]

答案 1 :(得分:2)

这是一个基本R函数聚合和合并的解决方案

档案dat.csv:

 Sex,FamilySize,Surname
 male,1,Abbing
 female,3,Abbott
 male,3,Abbott
 male,3,Abbott
 male,1,Abelseth
 female,1,Abelseth
 male,2,Abelson
 female,2,Abelson
 male,1,Abrahamsson
 female,1,Abrahim

代码

 d <-  read.csv('dat.csv')

 num_fem <- aggregate(Sex ~ ., dat=d, function(x) length(which(x == 'female')))
 d_rat <- with(num_fem, data.frame(Ratio=Sex / FamilySize, Surname=Surname))

 merge(d, d_rat)

 #       Surname    Sex FamilySize     Ratio
 #1       Abbing   male          1 0.0000000
 #2       Abbott female          3 0.3333333
 #3       Abbott   male          3 0.3333333
 #4       Abbott   male          3 0.3333333
 #5     Abelseth   male          1 1.0000000
 #6     Abelseth female          1 1.0000000
 #7      Abelson   male          2 0.5000000
 #8      Abelson female          2 0.5000000
 #9  Abrahamsson   male          1 0.0000000
 #10     Abrahim female          1 1.0000000

答案 2 :(得分:2)

使用dplyr

library(dplyr)
table_family %>%
    group_by(Surname) %>%
    mutate(Ratio = sum(Sex == "female") / FamilySize)