说我在R中的数据框看起来像下面那样。 性是男/女。 Familysize是姓氏相同的家庭成员人数。姓氏是姓氏。
Sex FamilySize Surname
male 1 Abbing
female 3 Abbott
male 3 Abbott
male 3 Abbott
male 1 Abelseth
female 1 Abelseth
male 2 Abelson
female 2 Abelson
male 1 Abrahamsson
female 1 Abrahim
我想添加一个新列FemaleToFamilySizeRatio,它会给出每个家庭中女性数量的比率。结果如下所示:
Sex FamilySize Surname Ratio
male 1 Abbing 0
female 3 Abbott 0.33
male 3 Abbott 0.33
male 3 Abbott 0.33
male 1 Abelseth 0.5
female 1 Abelseth 0.5
male 2 Abelson 0.5
female 2 Abelson 0.5
male 1 Abrahamsson 0
female 1 Abrahim 0
我玩桌子,聚集,最有希望的是ddply。我已经达到了一个方向,一些方向会有所帮助,因为如果我继续我的代码将只会变得冗长和丑陋。
答案 0 :(得分:3)
你可以使用data.table
来做到这一点library(data.table)
table_family <- data.table(table_input)
table_family[, Ratio := sum(Sex == "female") / FamilySize[1], by = "Surname"]
答案 1 :(得分:2)
这是一个基本R函数聚合和合并的解决方案
档案dat.csv:
Sex,FamilySize,Surname
male,1,Abbing
female,3,Abbott
male,3,Abbott
male,3,Abbott
male,1,Abelseth
female,1,Abelseth
male,2,Abelson
female,2,Abelson
male,1,Abrahamsson
female,1,Abrahim
代码
d <- read.csv('dat.csv')
num_fem <- aggregate(Sex ~ ., dat=d, function(x) length(which(x == 'female')))
d_rat <- with(num_fem, data.frame(Ratio=Sex / FamilySize, Surname=Surname))
merge(d, d_rat)
# Surname Sex FamilySize Ratio
#1 Abbing male 1 0.0000000
#2 Abbott female 3 0.3333333
#3 Abbott male 3 0.3333333
#4 Abbott male 3 0.3333333
#5 Abelseth male 1 1.0000000
#6 Abelseth female 1 1.0000000
#7 Abelson male 2 0.5000000
#8 Abelson female 2 0.5000000
#9 Abrahamsson male 1 0.0000000
#10 Abrahim female 1 1.0000000
答案 2 :(得分:2)
使用dplyr
library(dplyr)
table_family %>%
group_by(Surname) %>%
mutate(Ratio = sum(Sex == "female") / FamilySize)