使用正则表达式查找连续的元音

时间:2014-03-28 02:18:51

标签: regex r sapply

我有以下代码来查找连续的元音,但它没有给我正确的结果: **我的代码错了吗?

sapply(v, function(x){ gsub(".*[0-9]\\s", "", grep("[aeiou]{2}", x, value = TRUE, invert = FALSE)) })

其中v是:

c("Joe 4311 rsfuvgcozbxwlnnfevze", "Clayton 2414 qsncnpvdfpjmvmvbdvce", 
"Addison 25 melmasilbgrurqbezgyu", "Donovan 2013 gozagvswtitjjinrzgup", 
"Sage 540 aamyvegiadwjwpvwtjko", "Zavier 133 cyomwtxftslukvmvpmcl", 
"Maria 1241 ngqjynxnpblcztnlkack", "Mercedes 2400 xcwbxxljspneilwejutw", 
"Micheal 4400 oovhyodyubhqwzdcwybf", "Brylee 2532 sarbmelbeycrnhytbout", 
"Giancarlo 3351 xmocyljxquklbchgmdcj", "Elin 5513 nbjovdtmijpfluzixebu", 
"Ray 2553 snrqrzshlzmmhumzlecl", "Jade 4030 rhibewstyrwdervgqnru", 
"Amelia 5205 lcnvnjhamhzavdfosmae", "Karissa 2030 vhvzyfckgogduqqayzku", 
"Conor 325 sbgfntjejbtwsvidvtnu", "Tripp 454 xmvuhycjnvqgnmorfdrl", 
"River 5120 zcxavkwzhwbvdqadajgh", "Tianna 251 mwoqwzyfddhuunmtiioh", 
"Conner 3543 ngyuzdbeyizfarxuxntz", "Mackenzie 3113 yvycqaquwtfjjtqsdduh", 
"Melody 4422 buagtfiaipniavdnsxhv", "Dallas 5343 blyjvtlpvpqondrdhluu")

在v中,每一行的形式为" NAME SCORES WORD"我们想找到WORD中有多少行有两个连续的元音?

3 个答案:

答案 0 :(得分:4)

如果您先strsplit该文字,则可以更轻松地应用grep

v[grep("[aeiou]{2}",sapply(strsplit(v," "),"[",3))]

#[1] "Sage 540 aamyvegiadwjwpvwtjko"     
#[2] "Mercedes 2400 xcwbxxljspneilwejutw"
#[3] "Micheal 4400 oovhyodyubhqwzdcwybf" 
#[4] "Brylee 2532 sarbmelbeycrnhytbout"  
#[5] "Amelia 5205 lcnvnjhamhzavdfosmae"  
#[6] "Tianna 251 mwoqwzyfddhuunmtiioh"   
#[7] "Melody 4422 buagtfiaipniavdnsxhv"  
#[8] "Dallas 5343 blyjvtlpvpqondrdhluu"  

答案 1 :(得分:2)

这里是如何一次性完成的。我们可以使用这个正则表达式跳过WORD之前的所有内容,并在最后一部分中查找连续的元音。

> (zz <- do.call(rbind, lapply(v, function(x){ 
      grep("^.*[0-9]\\s.*[aeiou]{2}", x, value = TRUE)
      })))
     [,1]                                
[1,] "Sage 540 aamyvegiadwjwpvwtjko"     
[2,] "Mercedes 2400 xcwbxxljspneilwejutw"
[3,] "Micheal 4400 oovhyodyubhqwzdcwybf" 
[4,] "Brylee 2532 sarbmelbeycrnhytbout"  
[5,] "Amelia 5205 lcnvnjhamhzavdfosmae"  
[6,] "Tianna 251 mwoqwzyfddhuunmtiioh"   
[7,] "Melody 4422 buagtfiaipniavdnsxhv"  
[8,] "Dallas 5343 blyjvtlpvpqondrdhluu"  
> length(zz)
[1] 8

答案 2 :(得分:0)

我认为如果你制作三个变量,你的生活会更容易 (名称,分数,单词)明确:

library(stringr)
df <- as.data.frame(str_split_fixed(v, " ", 3))
names(df) <- c("name", "score", "word")

然后提取匹配是一个简单的子集:

subset(df, str_detect(word, "[aeiou]{2}"))

##        name score                 word
## 5      Sage   540 aamyvegiadwjwpvwtjko
## 8  Mercedes  2400 xcwbxxljspneilwejutw
## 9   Micheal  4400 oovhyodyubhqwzdcwybf
## 10   Brylee  2532 sarbmelbeycrnhytbout
## 15   Amelia  5205 lcnvnjhamhzavdfosmae
## 20   Tianna   251 mwoqwzyfddhuunmtiioh
## 23   Melody  4422 buagtfiaipniavdnsxhv
## 24   Dallas  5343 blyjvtlpvpqondrdhluu