Question

假设我有一些原始文本：

here is some text that has a substring that I'm interested in embedded in it.

我需要将文字与其中的一部分相匹配，例如：“has a substring”。

但是，原始文本和匹配字符串可能存在空格差异。例如，匹配文本可能是：

has a
substring

或

has  a substring

和/或原始文本可能是：

here is some
text that has
a substring that I'm interested in embedded in it.

我需要输出的程序是：

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

我还需要保留原始空白模式，只需添加开始和结束标记。

有关使用Perl正则表达式实现此方法的任何想法？我试过了，但最终却感到非常困惑。

Answer 1

自从我使用perl正则表达式以来已经过了一段时间，但是：

$match = s/(has\s+a\s+substring)/[$1]/ig

这将捕获单词之间的零个或多个空格和换行符。它将用括号包围整个匹配，同时保持原始分隔。它不是自动的，但确实有效。

你可以用这个来玩游戏，比如拿字符串"has a substring"并对其进行转换，使它"has\s*a\s*substring"使它变得不那么痛苦。

编辑：合并了ysth的评论，即\ s元字符匹配换行符和hobbs更正我的用法。

Answer 2

此模式将匹配您要查找的字符串：

(has\s+a\s+substring)

因此，当用户输入搜索字符串时，用\s+替换搜索字符串中的任何空格，即可获得模式。只需将[match starts here]$1[match ends here]替换为$1，其中{{1}}是匹配的文字。

Answer 3

在正则表达式中，您可以使用+表示“一个或多个”。像这样的东西

/has\s+a\s+substring/

匹配has后跟一个或多个空白字符，然后是a，后跟一个或多个空格字符，后跟substring。

将其与替换运算符放在一起，您可以说：

my $str = "here is some text that has     a  substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;

print $str;

输出是：

here is some text that [match starts here]has     a  substring[match ends here] that I'm interested in embedded in it.

Answer 4

许多人建议，使用\s+来匹配空格。以下是您自动执行此操作的方法：

my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";

my $re = $search;
$re =~ s/\s+/\\s+/g;

$original =~ s/\b$re\b/[match starts here]$&[match ends here]/g;

print $original;

输出：

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

您可能想要转义字符串中的任何元字符。如果有人感兴趣，我可以添加它。

Answer 5

这是你如何做到这一点的一个例子。

#! /opt/perl/bin/perl
use strict;
use warnings;

my $submatch = "has a\nsubstring";

my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";

print substr_match($str, $submatch), "\n";

sub substr_match{
  my($string,$match) = @_;

  $match =~ s/\s+/\\s+/g;

  # This isn't safe the way it is now, you will need to sanitize $match
  $string =~ /\b$match\b/;
}

目前，这可以检查$match变量中的不安全字符。

当我匹配并替换Perl中的几个单词时，如何保留空格？

5 个答案: