Question

我必须将一个字符串拆分成包含单词或特殊字符的部分。

我们说我有字符串'这是'另一个问题......“'。我想要得到的是一个由这些部分组成的数组:('This'，'is'，'''，'another'，'problem'，'...'，'“'）。

我在JavaScript中使用以下RegExp完成了这项工作，该工作正常：

string.match(/([^-\s\w])\1*|[-\w]+/g); // works

在Perl中使用相同的方法不起作用，因为我用来组合连续字符的子模式，我也得到了这些匹配：

@matches = $string =~ m/(([^-\s\w])\2*|[-\w]+)/g; # does not work

有没有办法在结果或正则表达式本身中删除子模式/子匹配？

Answer 1

在你的“不工作”的例子中，我认为你的意思是\ 2，而不是\ 1。

您必须遍历匹配才能执行此操作：

push @matches, "$1" while $string =~ m/(([^-\s\w])\2*|[-\w]+)/g;

Answer 2

my @matches;
push @matches, ${^MATCH} while $string =~ /([^-\s\w])\1*|[-\w]+/pg;

my @matches;
push @matches, $1 while $string =~ /(([^-\s\w])\2*|[-\w]+)/g;

my $i = 1;
my @matches = grep ++$i % 2, $string =~ /(([^-\s\w])\2*|[-\w]+)/g;

Answer 3

在Perl中，有多种方法可以做到这一点（TMTOWTDI）：

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str='Here\'s a (good, bad, ..., ?) example to be used in this "reg-ex" test.';

# NB: grepping on $_ will remove empty results

my @matches = grep { $_ } split(/
  \s*             # discard possible leading whitespace
  (
    \.{3}         # ellipsis (must come before punct)
  |
    \w+\-\w+      # hyphenated words
  |
    \w+\'(?:\w+)? # compound words
  | 
    \w+           # other words
  | 
    [[:punct:]]   # other punctuation chars
  )
/x,$str);

print Dumper(\@matches);

将打印：

$VAR1 = [
      'Here\'s',
      'a',
      '(',
      'good',
      ',',
      'bad',
      ',',
      '...',
      ',',
      '?',
      ')',
      'example',
      'to',
      'be',
      'used',
      'in',
      'this',
      '"',
      'reg-ex',
      '"',
      'test',
      '.'
    ];

如何在Perl中排除子匹配？

3 个答案: