在R中的向量中去除非A-Z字符

时间:2011-09-24 22:53:12

标签: r letters

我有一个用户名的向量,其中包含非A-Z字符。 我希望能够剥离这些角色。 有人告诉我使用字母向量,但y =x[letters]似乎不起作用。

由于

3 个答案:

答案 0 :(得分:4)

如果x是你的向量,请使用一对带有gsub的范围正则表达式,并用空字符串替换all。使用^给出了模式的否定:

gsub("[^a-zA-Z]", "", x)

例如,有一些简单的数据。

 gsub("[^a-zA-Z]", "", c(letters, LETTERS, "3s8t7a2c9k:o3v8e7r%F%L^O#W%&^%@#^"))
 [1] "a"             "b"             "c"             "d"             "e"             "f"             "g"             "h"            
 [9] "i"             "j"             "k"             "l"             "m"             "n"             "o"             "p"            
[17] "q"             "r"             "s"             "t"             "u"             "v"             "w"             "x"            
[25] "y"             "z"             "A"             "B"             "C"             "D"             "E"             "F"            
[33] "G"             "H"             "I"             "J"             "K"             "L"             "M"             "N"            
[41] "O"             "P"             "Q"             "R"             "S"             "T"             "U"             "V"            
[49] "W"             "X"             "Y"             "Z"             "stackoverFLOW"

答案 1 :(得分:2)

也许这就是你想要的

username <- "user12_AB"
strip_non_letters <- function(s) {
  idx <- which(strsplit(tolower(s),"")[[1]] %in% letters)
  paste(strsplit(s, "")[[1]][idx], collapse="")
}
strip_non_letters(username)

答案 2 :(得分:1)

类似于上面的Karsten,希望不要太多余

    usernames <- c("A!ex25","Goerge?","H@rry","Dumbname89")
    # a function to cut out non-letters
    onlyletters <- function(x){
    chars <- unlist(strsplit(x,split=""))
    charsout <- chars[chars%in%c(letters,LETTERS)]
    paste(charsout,sep="",collapse="")
    }
    sapply(usernames,onlyletters)
    > A!ex25    Goerge?      H@rry Dumbname89 
    > "Aex"   "Goerge"     "Hrry" "Dumbname"