我有一个数据库结构 - 下面的缩写版本
structure(list(sex1 = c("totalmaleglobal", "totalfemaleglobal",
"totalglobal", "totalfemaleGSK", "totalfemaleglobal",
"totalfemaleUN")), .Names = "sex1", row.names = c(NA, 6L),
class="data.frame")
我想提取“总计'”,“总人数”,“总人数”等字样
怎么做?
我使用以下代码尝试了正则表达式
pattern1=c("total")
pattern2=c("totalmale")
pattern3=c("totalfemale")
daly$sex <- str_extract(daly$sex1,pattern1)
daly$sex <- str_extract(daly$sex1,pattern2)
daly$sex <- str_extract(daly$sex1,pattern3)
但它给了我NA。
答案 0 :(得分:2)
也许
library(stringr)
daly$sex <- str_extract(daly$sex1,paste(rev(mget(ls(pattern = "pattern\\d+"))), collapse="|"))
daly
# sex1 sex
# 1 totalmaleglobal totalmale
# 2 totalfemaleglobal totalfemale
# 3 totalglobal total
# 4 totalfemaleGSK totalfemale
# 5 totalfemaleglobal totalfemale
# 6 totalfemaleUN totalfemale
答案 1 :(得分:2)
gsub
,
v2 <- gsub(paste(v1, collapse='|'), '', d1$sex1)
gsub(paste(v2, collapse='|'), '', d1$sex1)
#[1] "totalmale" "totalfemale" "total" "totalfemale" "totalfemale" "totalfemale"
,其中
v1 <- c('total', 'totalmale', 'totalfemale')
答案 2 :(得分:1)
试试这个:
{{1}}
答案 3 :(得分:0)
我们还可以对所需模式(sapply
向量)执行grepl
和s1
(在基数R中),如下所示:
x <- sapply(s1,function(x) grepl(x, d1$sex1))
colnames(x)[max.col(x, ties.method = "first")]
# [1] "totalmale" "totalfemale" "total" "totalfemale" "totalfemale" "totalfemale"
其中
s1 <- c("totalmale", "totalfemale", "total")