识别焦点观察以外的群体成员特征{dplyr}

时间:2016-07-20 02:09:11

标签: r

这是我的样本数据:

D <- data.frame(Family=c("A","A","A","B","B","B","c","c","c"),
           Name=c("Adam","Amy","Aaron","Bob","Brian","Brandon","Chris","Claire", "Chloe"),
           State=c("CA","PA","TX","CA","CA","CA","MA","MI","FL"),
            stringsAsFactors = FALSE)

  Family    Name State
1      A    Adam    CA
2      A     Amy    PA
3      A   Aaron    TX
4      B     Bob    CA
5      B   Brian    CA
6      B Brandon    CA
7      c   Chris    MA
8      c  Claire    MI
9      c   Chloe    FL

我需要创建一个函数来识别&#34;家庭成员是否在同一组中的焦点行&#34;住在加利福尼亚州

我已经尝试了

require(dplyr)
D1 <- D %>% group_by(Family) %>%
mutate(Family.in.CA = any(State=="CA"))


  Family    Name  State Family.in.CA
  <fctr>  <fctr> <fctr>        <lgl>
1      A    Adam     CA         TRUE
2      A     Amy     PA         TRUE
3      A   Aaron     TX         TRUE
4      B     Bob     CA         TRUE
5      B   Brian     CA         TRUE
6      B Brandon     CA         TRUE
7      c   Chris     MA        FALSE
8      c  Claire     MI        FALSE
9      c   Chloe     FL        FALSE

但是我所希望的功能需要亚当为假,因为在亚当的家庭中,除了亚当之外,没有人住在加利福尼亚州。

更新

由于OP引起了混淆,我要尝试将每一行与同一组中的其他行进行比较

#Adam checks whether Amy or Aaron is in CA == FALSE
#Amy checks whether Adam or Aaron is in CA == TRUE #Adam
#Aaron checks whether Adam or Amy is in CA == TRUE #Adam
#Bob checks whether Brian or Brandon is in CA == TRUE #Brian and Brandon
...

2 个答案:

答案 0 :(得分:1)

我们可以使用base R来执行此操作。我们通过'Family'split数据集,循环遍历行,检查“CA”是否%in%除当前行unsplit以外的任何“状态”并获取输出作为vector

unsplit(lapply(split(D, D$Family), function(x) 
        sapply(1:nrow(x), function(i) {
           x2 <- as.character(x$State[-i])
       "CA" %in% x2
        })), D$Family)
#[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

如果我们使用data.table,可以在一行

中完成
library(data.table)
setDT(D)[, Family.in.CA := unlist(lapply(1:.N, function(i) "CA" %in% State[-i])), Family]
D
#   Family    Name State Family.in.CA
#1:      A    Adam    CA        FALSE
#2:      A     Amy    PA         TRUE
#3:      A   Aaron    TX         TRUE
#4:      B     Bob    CA         TRUE
#5:      B   Brian    CA         TRUE
#6:      B Brandon    CA         TRUE
#7:      c   Chris    MA        FALSE
#8:      c  Claire    MI        FALSE
#9:      c   Chloe    FL        FALSE

答案 1 :(得分:1)

这不是你见过的最漂亮的dplyr代码,但它完成了工作:

D %>% group_by(Family) %>%
    mutate(Family.in.CA = list(as.character(State))) %>%
    mutate(Family.in.CA =
       mapply(function(xx, yy) "CA" %in% yy[-match(xx, yy)], State, Family.in.CA))

#Source: local data frame [9 x 4]
#Groups: Family [3]
#
#  Family    Name  State Family.in.CA
#  <fctr>  <fctr> <fctr>        <lgl>
#1      A    Adam     CA        FALSE
#2      A     Amy     PA         TRUE
#3      A   Aaron     TX         TRUE
#4      B     Bob     CA         TRUE
#5      B   Brian     CA         TRUE
#6      B Brandon     CA         TRUE
#7      c   Chris     MA        FALSE
#8      c  Claire     MI        FALSE
#9      c   Chloe     FL        FALSE

它收集每个家庭所有家庭成员所占用的所有州(首先mutate)。然后,它从该集合中移除(每个行中的人)所处状态,并检查“CA”是否在剩余列表中,该列表代表其他家庭成员的状态。