r-多次匹配中的部分匹配

时间:2015-10-10 04:26:43

标签: regex r string match grepl

我正在利用下面的代码与1场比赛进行部分匹配,但有一个跟进问题:假设我们有一个额外的鱼类标准,我们想要"狗鱼"被归类为鱼和犬。这可能吗?

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                 "black panther", "short cat", "red bird",
                 "short bird stuffed", "big eagle", "bad sparrow",
                 "dog fish", "head dog", "brown yorkie",
                 "lab short bulldog"), label=1:14)

在代码

的开头定义正则表达式
regexes <- list(c("(cat|lion|tiger|panther)","feline"),
            c("(bird|eagle|sparrow)","avian"),
            c("(dog|yorkie|bulldog)","canine"))

创建一个矢量,长度与df

相同
output_vector <- character(nrow(d))

对于每个正则表达式..

for(i in seq_along(regexes)){

#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]} 

将现在填充的输出向量插入数据帧

d$species <- output_vector

期望输出

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine, fish
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

原始堆栈溢出问题在这里:partial string matching r

0 个答案:

没有答案