这是我的样本数据:
D <- data.frame(Family=c("A","A","A","B","B","B","c","c","c"),
Name=c("Adam","Amy","Aaron","Bob","Brian","Brandon","Chris","Claire", "Chloe"),
State=c("CA","PA","TX","CA","CA","CA","MA","MI","FL"),
stringsAsFactors = FALSE)
Family Name State
1 A Adam CA
2 A Amy PA
3 A Aaron TX
4 B Bob CA
5 B Brian CA
6 B Brandon CA
7 c Chris MA
8 c Claire MI
9 c Chloe FL
我需要创建一个函数来识别&#34;家庭成员是否在同一组中的焦点行&#34;住在加利福尼亚州
我已经尝试了
require(dplyr)
D1 <- D %>% group_by(Family) %>%
mutate(Family.in.CA = any(State=="CA"))
Family Name State Family.in.CA
<fctr> <fctr> <fctr> <lgl>
1 A Adam CA TRUE
2 A Amy PA TRUE
3 A Aaron TX TRUE
4 B Bob CA TRUE
5 B Brian CA TRUE
6 B Brandon CA TRUE
7 c Chris MA FALSE
8 c Claire MI FALSE
9 c Chloe FL FALSE
但是我所希望的功能需要亚当为假,因为在亚当的家庭中,除了亚当之外,没有人住在加利福尼亚州。
更新
由于OP引起了混淆,我要尝试将每一行与同一组中的其他行进行比较
#Adam checks whether Amy or Aaron is in CA == FALSE
#Amy checks whether Adam or Aaron is in CA == TRUE #Adam
#Aaron checks whether Adam or Amy is in CA == TRUE #Adam
#Bob checks whether Brian or Brandon is in CA == TRUE #Brian and Brandon
...
答案 0 :(得分:1)
我们可以使用base R
来执行此操作。我们通过'Family'split
数据集,循环遍历行,检查“CA”是否%in%
除当前行unsplit
以外的任何“状态”并获取输出作为vector
。
unsplit(lapply(split(D, D$Family), function(x)
sapply(1:nrow(x), function(i) {
x2 <- as.character(x$State[-i])
"CA" %in% x2
})), D$Family)
#[1] FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
如果我们使用data.table
,可以在一行
library(data.table)
setDT(D)[, Family.in.CA := unlist(lapply(1:.N, function(i) "CA" %in% State[-i])), Family]
D
# Family Name State Family.in.CA
#1: A Adam CA FALSE
#2: A Amy PA TRUE
#3: A Aaron TX TRUE
#4: B Bob CA TRUE
#5: B Brian CA TRUE
#6: B Brandon CA TRUE
#7: c Chris MA FALSE
#8: c Claire MI FALSE
#9: c Chloe FL FALSE
答案 1 :(得分:1)
这不是你见过的最漂亮的dplyr
代码,但它完成了工作:
D %>% group_by(Family) %>%
mutate(Family.in.CA = list(as.character(State))) %>%
mutate(Family.in.CA =
mapply(function(xx, yy) "CA" %in% yy[-match(xx, yy)], State, Family.in.CA))
#Source: local data frame [9 x 4]
#Groups: Family [3]
#
# Family Name State Family.in.CA
# <fctr> <fctr> <fctr> <lgl>
#1 A Adam CA FALSE
#2 A Amy PA TRUE
#3 A Aaron TX TRUE
#4 B Bob CA TRUE
#5 B Brian CA TRUE
#6 B Brandon CA TRUE
#7 c Chris MA FALSE
#8 c Claire MI FALSE
#9 c Chloe FL FALSE
它收集每个家庭所有家庭成员所占用的所有州(首先mutate
)。然后,它从该集合中移除(每个行中的人)所处状态,并检查“CA”是否在剩余列表中,该列表代表其他家庭成员的状态。