如何使我的正则表达式匹配第一个模式而不是最后一个?

时间:2008-10-24 21:09:01

标签: regex perl

我可能做错了。我有一个充满数据的文本文件,我想匹配并替换文件中“item”和“catalog number”的模式。但是文件中每个元素的顺序非常重要,所以我希望从文件顶部开始匹配/替换,然后再向下工作。

下面的代码片段实际上有效,但是当我执行它时,它会替换“SeaMonkey”的第三个实例。 “SMKY-1978”模式然后它取代了该模式的第二个实例。我想要它做的是替换模式的第一个实例,然后替换第二个实例。

所以我希望输出说“找到 Kurt的 SMKY-1978 SeaMonkeys”然后“找到 Shane的 SMKY-1978 SeaMonkeys”然后离开Mick的SMKY- 1978年单独使用SeaMonkeys,因为我只想找到并替换该模式的前2个实例。现在它说“找到 Shane的 SMKY-1978 SeaMonkeys”和“找到 Mick的 SMKY-1978 SeaMonkeys”,因为它在每次执行for循环时匹配最后一个模式。

所以,我错过了一个微妙的鲜为人知的正则表达式角色,或者我只是在做我想做的完全彻底的错误?

以下是工作代码:

# my regexp matches from the bottom to the top but I'd like it to replace from the top down
local $/=undef;
my $DataToParse = <DATA>;
my $item = "SeaMonkeys";
my $catNum = "SMKY-1978";
my $maxInstancesToReplace = 2;
parseData();
exit();

sub parseData {
    for (my $counter = 0; $counter < $maxInstancesToReplace; $counter++) {
        # Stick in a temporary text placeholder that I will replace later after more processing
        $DataToParse =~ s/(.+)\sELEMENT\s(.+?)\s\(Item := \"$item\".+?CatalogNumber := \"$catNum.+?END_ELEMENT(.+)/$1 ***** Found $2\'s $catNum $item. (counter: $counter) *****$3/s;
    } 
    print("Here's the result:\n$DataToParse\n");
}

__DATA__
    ELEMENT Kurt (Item := "BrightLite",
                  ItemID := 29,
                  CatalogNumber := "BTLT-9274",
                  Vendor := 100,
    END_ELEMENT

    ELEMENT Mick (Item := "PetRock",
                  ItemID := 36,
                  CatalogNumber := "PTRK-3475/A",
                  Vendor := 82,
    END_ELEMENT

    ELEMENT Kurt (Item := "SeaMonkeys",
                  ItemID := 12,
                  CatalogNumber := "SMKY-1978/E",
                  Vendor := 77,
    END_ELEMENT

    ELEMENT Joe (Item := "Pong",
                 ItemID := 24,
                 CatalogNumber := "PONG-1482",
                 Vendor := 5,
    END_ELEMENT

    ELEMENT Shane (Item := "SeaMonkeys",
                   ItemID := 1032,
                   CatalogNumber := "SMKY-1978/E",
                   Vendor := 77,
    END_ELEMENT

    ELEMENT Kurt (Item := "Battleship",
                  ItemID := 99,
                  CatalogNumber := "BTLS-5234",
                  Vendor := 529,
    END_ELEMENT

    ELEMENT Mick (Item := "SeaMonkeys",
                  ItemID := 8,
                  CatalogNumber := "SMKY-1978/F",
                  Vendor := 77,
    END_ELEMENT

    ELEMENT Frank (Item := "PetRock",
                   ItemID := 42,
                   CatalogNumber := "PTRK-3475/B",
                   Vendor := 82,
    END_ELEMENT

    ELEMENT Joe (Item := "SeaMonkeys",
                 ItemID := 8,
                 CatalogNumber := "SMKY-1979/A",
                 Vendor := 77,
    END_ELEMENT

以下是目前输出的内容:

Here's the result:
        ELEMENT Kurt (Item := "BrightLite",
                      ItemID := 29,
                      CatalogNumber := "BTLT-9274",
                      Vendor := 100,
        END_ELEMENT

        ELEMENT Mick (Item := "PetRock",
                      ItemID := 36,
                      CatalogNumber := "PTRK-3475/A",
                      Vendor := 82,
        END_ELEMENT

        ELEMENT Kurt (Item := "SeaMonkeys",
                      ItemID := 12,
                      CatalogNumber := "SMKY-1978/E",
                      Vendor := 77,
        END_ELEMENT

        ELEMENT Joe (Item := "Pong",
                     ItemID := 24,
                     CatalogNumber := "PONG-1482",
                     Vendor := 5,
        END_ELEMENT

 ***** Found Shane's SMKY-1978 SeaMonkeys. (counter: 1) *****

        ELEMENT Kurt (Item := "Battleship",
                      ItemID := 99,
                      CatalogNumber := "BTLS-5234",
                      Vendor := 529,
        END_ELEMENT

 ***** Found Mick's SMKY-1978 SeaMonkeys. (counter: 0) *****

        ELEMENT Frank (Item := "PetRock",
                       ItemID := 42,
                       CatalogNumber := "PTRK-3475/B",
                       Vendor := 82,
        END_ELEMENT

        ELEMENT Joe (Item := "SeaMonkeys",
                     ItemID := 8,
                     CatalogNumber := "SMKY-1979/A",
                     Vendor := 77,
        END_ELEMENT

2 个答案:

答案 0 :(得分:11)

。+在你的正则表达式开始时是“贪婪的”。这意味着它将匹配最大字符。

你的正则表达式写得更好(它会更可读,更快)

my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT/;

我认为您可以简单地重复此匹配:

sub parseData {
    my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT(.+)/;
    foreach my $counter (0..$maxInstancesToReplace) {
      # Stick in a temporary text placeholder that I will replace later after more processing
      $DataToParse =~ s/$re/ ***** Found $1\'s $catNum $item. (counter: $counter) *****$2/s;
    } 
    print("Here's the result:\n$DataToParse\n");
}

如果无法重复,则应使用/ e regex修饰符。

答案 1 :(得分:-1)

最好的解决方案似乎是从数据中获取每个ELEMENT ... END_ELEMENT部分,并且一次只从一个部分获取正则表达式,而不是一次性将整个完整数据集提供给正则表达式。不完全是我想要完成的,但我重写了我的程序来做这个零碎的处理,它就像一个魅力。