我有将数据分组的数据,
df <- data.frame(group_id= c(1, 1, 1, 1, 2, 1, 2, 3, 4),
words = c("beach", "sand", "trip", "warm","travel", "water","beach","sand", "trees"),
ID = c("vacation", "vacation", "vacation", "vacation", "meeting","vacation","meeting","onduty", "hiking"))
group_id
是ID
列的组。现在,我想为每个组检查某些模式(“海滩”或“温暖”或“沙”),并在单独的列中打印匹配的模式,并在单独的列中匹配0(不匹配)或1(是)。 / p>
预期:
id words ID pattern Match
1 1 beach vacation Beach, sand, warm 1
2 1 sand vacation Beach, sand, warm 1
3 1 trip vacation Beach, sand, warm 1
4 1 warm vacation Beach, sand, warm 1
5 2 travel meeting Beach 1
6 1 water vacation Beach, sand, warm 1
7 2 beach meeting Beach 1
8 3 sand onduty sand 1
9 4 trees hiking 0 0
答案 0 :(得分:1)
ids <- df$ID[ grepl("^(beach|warm|sand)$",df$words) ]
df[df$ID %in% ids,]
# group_id words ID
#1 1 beach vacation
#2 1 sand vacation
#3 1 trip vacation
#4 1 warm vacation
#5 2 travel meeting
#6 1 water vacation
#7 2 beach meeting
#8 3 sand onduty
答案 1 :(得分:1)
您可以尝试以下方法。为unique
查找与键group_id
相关联的words
个值。使用df
子集[]
。
df[df$group_id %in% unique(df$group_id[df$words %in% c('beach', 'sand', 'warm')]),]
group_id words ID
1 1 beach vacation
2 1 sand vacation
3 1 trip vacation
4 1 warm vacation
5 2 travel meeting
6 1 water vacation
7 2 beach meeting
8 3 sand onduty
答案 2 :(得分:1)
使用sqldf
:
首先选择具有group_id
words
的{{1}},然后从这些('beach','sand','warm')
中选择所有值。
group_id
输出:
library(sqldf)
sqldf("select * from df where group_id IN(select group_id from df where words IN ('beach','sand','warm'))")
答案 3 :(得分:1)
我使用dplyr
grep
来获得所需的结果。
下面是代码:
library(dplyr)
pattern <- c("Beach", "sand", "warm")
df <- data.frame(group_id= c(1, 1, 1, 1, 2, 1, 2, 3, 4),
words = c("beach", "sand", "trip", "warm","travel", "water","beach","sand", "trees"),
ID = c("vacation", "vacation", "vacation", "vacation", "meeting","vacation","meeting","onduty", "hiking"))
x <- df %>%
group_by(group_id) %>%
summarise(words = paste(words, collapse = " "))
y <- sapply(pattern, function(d) grep(paste0("\\b",d,"\\b"),x$words , ignore.case = T))
y <- setNames(unlist(y, use.names=F),rep(names(y), lengths(y)))
y <- data.frame(Match_pattern =names(y), group_id=y, row.names=NULL)
y <- y %>%
group_by(group_id) %>%
summarise(Match_pattern = paste(Match_pattern, collapse = ", "))
out <- merge(df, y, by = "group_id", all.x = T)
out$N <- ifelse(is.na(out$Match_pattern), 0, 1)
> out
group_id words ID Match_pattern N
1 1 sand vacation Beach, sand, warm 1
2 1 trip vacation Beach, sand, warm 1
3 1 warm vacation Beach, sand, warm 1
4 1 beach vacation Beach, sand, warm 1
5 1 water vacation Beach, sand, warm 1
6 2 beach meeting Beach 1
7 2 travel meeting Beach 1
8 3 sand onduty sand 1
9 4 trees hiking <NA> 0