R正则表达式分号

时间:2016-05-03 11:37:12

标签: regex r grepl

我想检查列表中的所有元素是否都有我需要的模式,否则我将停止整个脚本。

示例列表如下所示:

[1]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobrevibacter;
[2]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;
[3]
Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;
[4]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[5]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[6]
Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;
[7]
Bacteria;Actinobacteria;Actinobacteria;Coriobacteriales;Coriobacteriaceae;Gordonibacter;
[8]
Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;
[9]
Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;

希望我所有条目都有六个分号。 我尝试与grepl进行模式匹配,但是我遇到了正确模式的问题。 这是我试过的

if(!any(grepl(";{6}", taxonomy))) { Through error message if the
taxonomy is not in the right format   stop("Wrong number of taxonomic
classes\n Taxonomic levels have to be separated by semicolons (six in
total).  IMPORTANT: if taxonomic information at any level is missing,
the semicolons are still needed:\n  
e.g.Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;
      e.g.Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;;")
} else {

但我总是这样做。

3 个答案:

答案 0 :(得分:2)

count.fields使用sep参数作为字段分隔符,返回作为第一个参数给出的文件或连接的每一行中的字段数。没有包使用。

f <- function(x) {
  ok <- count.fields(textConnection(x), sep = ";") == 7
  if (any(!ok)) stop("these row numbers do not have 7 fields: ", which(!ok))
  # add whatever other code you need
}

测试出来:

# x has 2 components having 7 and 3 semicolon-separated fields respectively
x <- c("Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobrevibacter;", ";;")
f(x)
## Error in f(x) : these row numbers do not have 7 fields: 2

请参阅?count.fields?textConnection

答案 1 :(得分:1)

;{6}

匹配";;;;;;",没有别的。你想检查像

这样的东西
(?:[^;]*;){6}

匹配if(至少)6个分号出现在字符串中。

如果你需要声明你测试的每一行完全 6个分号,你需要更加具体:

^(?:[^;]*;){6}[^;]*$

其中^$是字符串anchors的开头/结尾,[^;]*是一个否定的character class,它匹配除分号以外的任意数量的字符。< / p>

R代码

> x<-c('Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobrevibacter;',
  'Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;',
  'Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanosphaera;',
  'Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;',
  'Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;',
  'Bacteria;Actinobacteria;Actinobacteria;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;',
  'Bacteria;Actinobacteria;Actinobacteria;Coriobacteriales;Coriobacteriaceae;Gordonibacter;',
  'Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;',
  'Bacteria;Actinobacteria;Coriobacteriia;Coriobacteriales;Coriobacteriaceae;;')
> grepl("^(?:[^;]*;){6}[^;]*$", x)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[9] TRUE

答案 2 :(得分:0)

使用library(stringr) which(str_count(taxonomy, ';') == 6) 你可以做类似的事情,

grepl(6, str_count(taxonomy, ';'))

$(window).resize(function() {
   $('body').prepend('<div>' + $(window).width() + '</div>');
 });