正则表达式直到第一次出现括号关闭

时间:2017-03-07 11:16:44

标签: r regex stringr

我有一个名为cars的字符串,如下所示:

cars
[1] "Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair"   
[2] "Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition."

我需要从字符串中提取以下部分:

car(52;model-14557)
car(21, model-155)
car ( 36, model-8878)

我尝试使用以下的一部分来提取它:

stringr::str_extract_all(cars, "(.car\\s{0,5}\\(([^]]+)\\))")

这给了我以下输出:

[[1]]
[1] " car(52;model-14557) had a good engine(workable condition)"

[[2]]
[1] " car(21, model-155) looked in good condition but car ( 36, model-8878)"

有没有办法可以提取带有相关编号和型号的汽车一词?

1 个答案:

答案 0 :(得分:2)

Your regex does not work因为您使用[^]]+]以外的一个或多个符号与()匹配,因此匹配来自( ) {1}}直到最后],其间没有> cars <- c("Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair","Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition.") > library(stringr) > str_extract_all(cars, "\\bcar\\s*\\([^()]+\\)") [[1]] [1] "car(52;model-14557)" [[2]] [1] "car(21, model-155)" "car ( 36, model-8878)"

使用

\bcar\s*\([^()]+\)

正则表达式为\b,请参阅online regex demo here

匹配:

  • car - 字边界
  • \s* - 文字字符序列
  • \( - 0+ whitespaces
  • ( - 文字[^()]+
  • ( - 除)\)以外的一个或多个字符
  • ) - 文字> regmatches(cars, gregexpr("\\bcar\\s*\\([^()]+\\)", cars)) [[1]] [1] "car(52;model-14557)" [[2]] [1] "car(21, model-155)" "car ( 36, model-8878)"

请注意,相同的正则表达式将使用以下基本R代码产生相同的结果:

<a href="{{calendar.url}}" target="_blank">{{ 'Home.calendar.readMore' | translate }} <i class="fa fa-angle-right"></i></a>