Question

我有一个数据集如下

Pt    EVENT
123    GGG
123    Nor
123    tre
144    GGG
1667   tre
1667   Nor
1667   tre

我希望按照我之前的question分组_然后数据，但是我只对创建包含Nor的组感兴趣，所以最终的数据集应该是

Pt    
123   GGG      Nor   tre
1667  tre      Nor   tre

我得到了上一个问题的答案，但我似乎无法选择特定的组。这是我尝试过但它给了我

Sankey<-EndoSubset %>%
  group_by(Pt) %>% 
  filter(EVENT == "Nor")

但显然这只是给了我包含Nor的行，而我希望Pt的所有行都包含Nor

Answer 1

我们可以尝试使用data.table。将'data.frame'转换为'data.table'（setDT(df1)，使用dcast从'long'转换为'wide'，然后我们检查列2：4的“Nor”值，将elementwise与Reduce进行比较，并使用逻辑向量对行进行子集化。

library(data.table)
res <- dcast(setDT(df1), Pt~rowid(Pt), value.var="EVENT", fill="")
res[res[, Reduce(`|`, lapply(.SD,`==`, "Nor" )) , .SDcols = 2:4]]
#     Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

紧凑的选项是在subset步骤之前dcast“Nor”行

dcast(setDT(df1)[, if(any(EVENT=="Nor")) .SD, Pt], Pt~rowid(Pt), value.var ="EVENT")
#    Pt   1   2   3
#1:  123 GGG Nor tre
#2: 1667 tre Nor tre

如果我们使用的是dplyr，则data.table中第二个选项中显示的方法类似

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Pt) %>%
   filter(any(EVENT=="Nor")) %>% 
   mutate(n = row_number())  %>%
   ungroup() %>%
   spread(n, EVENT)
#    Pt     1     2     3
#  <int> <chr> <chr> <chr>
#1   123   GGG   Nor   tre
#2  1667   tre   Nor   tre

或仅使用base R方法

reshape(transform(df1[with(df1, ave(EVENT=="Nor", Pt, FUN=any)),],
   time = ave(seq_along(EVENT), Pt, FUN = seq_along)), direction="wide", idvar="Pt")
#    Pt EVENT.1 EVENT.2 EVENT.3
#1  123     GGG     Nor     tre
#5 1667     tre     Nor     tre

Answer 2

根据数据的外观，将结果存储为列表可能有意义。这是一个实现这个目标的基础R：

# split data by EVENT and check EVENT is in desired set 
# if yes, collect Pt into character vectors
myList <- lapply(split(df, df$Pt), function(i) {
                                  if("Nor" %in% as.character(i$EVENT)) {
                                    unlist(as.character(i$EVENT))}})
# drop list elements with length 0
myList <- myList[lengths(myList) > 0]

这将返回一个命名列表：

myList
$`123`
[1] "GGG" "Nor" "tre"

$`1667`
[1] "tre" "Nor" "tre"

如果您想要data.frame并且向量的长度相等，如示例所示，您可以使用rbind和do.call：

data.frame(do.call(rbind,temp))
      X1  X2  X3
123  GGG Nor tre
1667 tre Nor tre

请注意，Pt现在包含在rownames中。

感谢@RichardScriven提醒我lengths。

Answer 3

使用base R

的另一个选项

summary_df <- aggregate(data=df,EVENT~.,FUN = paste,collapse=",")
summary_df[grep("Nor",summary_df$EVENT),]

仅当组在特定列中包含特定值时，如何group_by

3 个答案: