在R中查找单词的所有变种

时间:2016-01-16 14:29:27

标签: regex r character

我有以下的话。

Test errorFunc() {

    return Test->e->IOError; //gives an error
}

正如您所看到的,words <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88") [1] "hail(0.75)" "hail0.75" "hail0.88" "hail075" "hail1.00" "hail1.75" [7] "hail100" "hail125" "hail1.75)" "hail150" "hail175" "hail200" [13] "hail225" "hail275" "hail450" "hail088" "hail75" "hail80" [19] "hail88" 会以各种拼写错误/格式(即hail(0.75)hail075)重复

如何查找hail0.75的所有出现,包括上述变体?

我试过

hail(0.75)

查找包含数字grep("hail[0,7,5]"), words, value = T) [1] "hail0.75" "hail0.88" "hail075" "hail088" "hail75"的冰雹实例 但是,它包含075,这是不需要的,并且排除了所需的hail088

2 个答案:

答案 0 :(得分:2)

另一种选择是删除所有非数字数字并将其用作索引:

idx <- gsub("[^[:digit:]]","",words)
words[idx=="075"]
[1] "hail(0.75)" "hail0.75"   "hail075"

答案 1 :(得分:1)

这是你在找什么?

> x <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88")
> x
 [1] "hail(0.75)" "hail0.75"   "hail0.88"   "hail075"    "hail1.00"
 [6] "hail1.75"   "hail100"    "hail125"    "hail1.75)"  "hail150"
[11] "hail175"    "hail200"    "hail225"    "hail275"    "hail450"
[16] "hail088"    "hail75"     "hail80"     "hail88"

你grep:

> x[grep("^hail[[:punct:]]*0[[:punct:]]*75.*", x)]
[1] "hail(0.75)" "hail0.75"   "hail075"

这可以假设7和5总是彼此相邻。 快速说明:^表示字符串的开头,[[:punct:]]是任何标点字符,而*是前一个字符(在本例中为[[:punct:]])重复0或更多次。