Question

我要定位的文字部分始终以“还存在”开头，并以句点结尾。逗号之间的单个名称是我要定位的目标（即下面的示例中的“ randomperson”。这些名称将始终是不同的。这很棘手，因为存在其他不是单个单词“ names”的事物。也许只有一个单词/名称，我才能匹配逗号之间的所有内容，但我似乎无法弄清楚，名称列表可能更长或更短，因此表达式必须是动态的，而不仅仅是匹配设置名称数量。

目标文字：

Also there is a reinforced stone wall, a wooden wall, a stone wall, 
randomperson, a lumbering earth elemental, randomperson, randomperson,
randomperson.

（为了方便阅读，将其分成多行）

我该如何解决这个问题？

Answer 1

在程序中

my $text = "Also there is a reinforced stone wall, a wooden wall, a stone wall, "
    . "randomperson, a lumbering earth elemental, randomperson, "
    . "randomperson, randomperson."

my @single_words = 
    grep { split == 1 } 
    split /\s*,|\.|\!|;\s*/, 
        ($text =~ /Also there is (.*)/)[0];

$text上的正则表达式在该初始短语之后获取文本，然后是split 返回逗号（或其他标点符号）之间的字符串列表，并且grep过滤出具有多个单词^†的字符串。

在命令行上

echo "Also there is a reinforced stone wall, a wooden wall,..., randomperson,..."
| perl -wnE'say for 
    grep { split  == 1 } 
    split /\s*,|\.|\!|;\s*/, (/Also there is (.*)/)[0]'

与上面相同。

请向我们展示您尝试进行的操作，以获取其他说明和评论。

^†孤独的split使用默认值split ' ', $_，其中' '是在\s+上分割并丢弃前导和尾随的特殊模式空间。但是在表达式split == 1中，split在标量context中（由运算符==施加，在运算符的两边都需要一个值），因此它返回列表中的元素，然后与1进行比较。

Answer 2

代码

sed -r ':a
s/, ([a-zA-Z]*)([,\.])/\n##\1\n\2/
ta
' | sed -n 's/##//gp'

输出

randomperson
randomperson
randomperson
randomperson

说明：

开始循环

sed -r ':a

查找所有出现的“，oneword”或“，oneword”。并替换为## oneword或## oneword。 ##是一个魔术标记，可以在以后识别提取的名称

s/, ([a-zA-Z]*)([,\.])/\n##\1\n\2/

结束循环

ta

基于##过滤行以仅提取一个单词

' | sed -n 's/##//gp'

RegEx用于匹配逗号前面的单词，例外

目标文字：

2 个答案: