R搜索多个模式grepl

时间:2014-10-06 18:34:03

标签: r pattern-matching grepl

我有以下代码。我想找到具有字母数字值的单元格,并且还应该忽略na或NA的单元格。

如何将代码修改为?所需的R命令应返回newcolumn

的结果

真,真,假,假,真,假,假

我尝试了命令3和4,但是他们失败了:(

> newcolumn=c(1,2,"na","NA","abc","","*")

> grepl("[[:alnum:]]", newcolumn)
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

> grepl("[[:alnum:]] | na", newcolumn)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> grepl(c("[[:alnum:]]","na"), newcolumn)
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
Warning message:
In grepl(c("[[:alnum:]]", "na"), newcolumn) :
  argument 'pattern' has length > 1 and only the first element will be used

> grepl("[[:alnum:]]" | "na" | "NA", newcolumn)
Error in "[[:alnum:]]" | "na" : 
  operations are possible only for numeric, logical or complex types

> str(newcolumn)
 chr [1:7] "1" "2" "na" "NA" "abc" "" "*"

=========================== UPDATE1 =================== ============

newcolumn2<-newcolumn[grepl("(?=(?i)na(N)?(*SKIP)(*F))|[[:alnum:]]|(?=(?i)nan(*SKIP)(*F))|(?=(?i)null(*SKIP)(*F))", newcolumn, perl=TRUE)]

我更新了上面的代码,因为我想识别na,nan,null及其变体。但是&#34; null部分无效。我应该做些什么改变?

1 个答案:

答案 0 :(得分:1)

尝试:

 grepl("(?=(?i)na(*SKIP)(*F))|[[:alnum:]]", newcolumn, perl=TRUE)
 #[1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE

(?i)表示不区分大小写。因此,它应匹配naNAnANa。模式中的(*SKIP)(*F)使匹配失败。现在|符号右侧的模式即。 [[:alnum:]]将是匹配的那个。

更新

 newcolumn <- c(1,2,"na","NA","abc","","*", "NaN", "nan", "nAn")
 grepl("(?i)na(N)?(*SKIP)(*F)|[[:alnum:]]", newcolumn, perl=TRUE)
 # [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE