从字符串中提取匹配的单词

时间:2016-09-09 07:35:42

标签: regex r

我有一个数据库结构 - 下面的缩写版本

structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal", 
"totalglobal", "totalfemaleGSK", "totalfemaleglobal", 
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")

我想提取“总计'”,“总人数”,“总人数”等字样

怎么做?

我使用以下代码尝试了正则表达式

pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")

daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)

但它给了我NA。

4 个答案:

答案 0 :(得分:2)

也许

library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
#                sex1         sex
# 1   totalmaleglobal   totalmale
# 2 totalfemaleglobal totalfemale
# 3       totalglobal       total
# 4    totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6     totalfemaleUN totalfemale

答案 1 :(得分:2)

gsub

的两个步骤
v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)

gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale"   "totalfemale" "total"       "totalfemale" "totalfemale" "totalfemale"

,其中

v1 <- c('total', 'totalmale', 'totalfemale')

答案 2 :(得分:1)

试试这个:

{{1}}

答案 3 :(得分:0)

我们还可以对所需模式(sapply向量)执行grepls1(在基数R中),如下所示:

x <- sapply(s1,function(x) grepl(x, d1$sex1))
colnames(x)[max.col(x, ties.method = "first")]

# [1] "totalmale" "totalfemale" "total" "totalfemale" "totalfemale" "totalfemale"

其中

s1 <- c("totalmale", "totalfemale", "total")