正则表达式:在句子中查找子字符串

时间:2018-11-26 05:09:20

标签: regex perl

我有很多句子:

 1) the 3d line chart will show area in 3d.
 2) udcv123hi2ry32 the this line chart is useful.
 3) this chart.
 4) a chart.
 5) a line chart.
 6) this bar chart
 7) ...

我有条件

 1) substrings start by 'a' or 'the' or 'this' or '[chart name]'
 2) '[chart name] chart' is ok but 'this chart', 'a chart' are not accepted.
    (e.g. bar chart, line chart, this line chart, a area chart: OK,
     this chart, a chart: not accepted)
 3) substrings end by '.(dot)'

因此,我需要找到符合条件的子字符串。

在这种情况下,字符串:

"this line chart is very useful.", 
"area chart is very useful." are exactly what I want to receive.

我尝试通过这样的正则表达式来做到这一点(https://regex101.com/r/aX5htr/2):

(a|the|this)* *((?!\bthis chart\b|\bwhich chart\b|\ba chart\b|\bthe chart\b|\bthat chart\b|\d+).+ chart) .+\.

但不匹配...

如何解决这种情况?

1 个答案:

答案 0 :(得分:2)

您可以使用

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

void worker(int * pip){
  close(pip[1]);
  float buffer;
  while(read(pip[0], &buffer, sizeof(float)) != 0){
    printf("%f\n", buffer);
  }
  exit(1);
}

int main() {
  int pip[2];
  pipe(pip);

  if (fork() == 0) {
    worker(pip);
    exit(1);
  }

  close(pip[0]);
  float a[4] = {1.1, 2.2, 3.3, 4.4};

  for (int i = 0; i != sizeof(a) / sizeof(a[0]); ++i)
    write(pip[1], a + i, sizeof(a[0]));
  close(pip[1]);

  wait(NULL);

  return 0;
}

请参见regex demo

在线观看Perl演示:

my $rx = qr/(?x)                 # enable formatting whitespace/comments
    (?(DEFINE)                   # Start DEFINE block
      (?<start>a|the|this|which) # Match start delimiters
    )                            # End DEFINE block
    (?<res>                      # Group res holding the match
      \b(?&start)\s+chart\b      # Match start delims, 1+ whitespace, chart
      (*SKIP)(*F)                # and skip the match
      |                          # or
      \b(?:(?&start)\s+)?        # Optional start delim and 1+ whitespace
      \w+\s+chart\b              # 1+ word chars, 1+ whitespace, char, word boundary
      [^.]*                      # 0+ chars other than dot
    )                            # End of res group
/;

输出:

use strict;
use warnings;

my $rx = qr/(?x)                 # enable formatting whitespace/comments
    (?(DEFINE)                   # Start DEFINE block
      (?<start>a|the|this|which) # Match start delimiters
    )                            # End DEFINE block
    (?<res>                      # Group res holding the match
      \b(?&start)\s+chart\b      # Match start delims, 1+ whitespace, chart
      (*SKIP)(*F)                # and skip the match
      |                          # or
      \b(?:(?&start)\s+)?        # Optional start delim and 1+ whitespace
      \w+\s+chart\b              # 1+ word chars, 1+ whitespace, char, word boundary
      [^.]*                      # 0+ chars other than dot
    )                            # End of res group
/;
while (<DATA>) {
    if (/$rx/) {
        print "$+{res}\n";
    }
}

__DATA__
this chart.
this line chart.
this bar chart.
21684564523 this chart.
556465465456 this a line chart.
a chart.
a line chart.
which chart.
all this chart.
a chart.
123123 this chart..
123123 which chart.
all this line chart.
a line chart.
the 3d line chart will show area in 3d.
line chart.
area chart.
the chart.
1221513513 line chart.
1234125135 the chart.
123123 this bar chart.
udcvhi2ry32 the this line chart is useful.
twl chart.