我可能做错了。我有一个充满数据的文本文件,我想匹配并替换文件中“item”和“catalog number”的模式。但是文件中每个元素的顺序非常重要,所以我希望从文件顶部开始匹配/替换,然后再向下工作。
下面的代码片段实际上有效,但是当我执行它时,它会替换“SeaMonkey”的第三个实例。 “SMKY-1978”模式然后它取代了该模式的第二个实例。我想要它做的是替换模式的第一个实例,然后替换第二个实例。
所以我希望输出说“找到 Kurt的 SMKY-1978 SeaMonkeys”然后“找到 Shane的 SMKY-1978 SeaMonkeys”然后离开Mick的SMKY- 1978年单独使用SeaMonkeys,因为我只想找到并替换该模式的前2个实例。现在它说“找到 Shane的 SMKY-1978 SeaMonkeys”和“找到 Mick的 SMKY-1978 SeaMonkeys”,因为它在每次执行for循环时匹配最后一个模式。
所以,我错过了一个微妙的鲜为人知的正则表达式角色,或者我只是在做我想做的完全彻底的错误?
以下是工作代码:
# my regexp matches from the bottom to the top but I'd like it to replace from the top down
local $/=undef;
my $DataToParse = <DATA>;
my $item = "SeaMonkeys";
my $catNum = "SMKY-1978";
my $maxInstancesToReplace = 2;
parseData();
exit();
sub parseData {
for (my $counter = 0; $counter < $maxInstancesToReplace; $counter++) {
# Stick in a temporary text placeholder that I will replace later after more processing
$DataToParse =~ s/(.+)\sELEMENT\s(.+?)\s\(Item := \"$item\".+?CatalogNumber := \"$catNum.+?END_ELEMENT(.+)/$1 ***** Found $2\'s $catNum $item. (counter: $counter) *****$3/s;
}
print("Here's the result:\n$DataToParse\n");
}
__DATA__
ELEMENT Kurt (Item := "BrightLite",
ItemID := 29,
CatalogNumber := "BTLT-9274",
Vendor := 100,
END_ELEMENT
ELEMENT Mick (Item := "PetRock",
ItemID := 36,
CatalogNumber := "PTRK-3475/A",
Vendor := 82,
END_ELEMENT
ELEMENT Kurt (Item := "SeaMonkeys",
ItemID := 12,
CatalogNumber := "SMKY-1978/E",
Vendor := 77,
END_ELEMENT
ELEMENT Joe (Item := "Pong",
ItemID := 24,
CatalogNumber := "PONG-1482",
Vendor := 5,
END_ELEMENT
ELEMENT Shane (Item := "SeaMonkeys",
ItemID := 1032,
CatalogNumber := "SMKY-1978/E",
Vendor := 77,
END_ELEMENT
ELEMENT Kurt (Item := "Battleship",
ItemID := 99,
CatalogNumber := "BTLS-5234",
Vendor := 529,
END_ELEMENT
ELEMENT Mick (Item := "SeaMonkeys",
ItemID := 8,
CatalogNumber := "SMKY-1978/F",
Vendor := 77,
END_ELEMENT
ELEMENT Frank (Item := "PetRock",
ItemID := 42,
CatalogNumber := "PTRK-3475/B",
Vendor := 82,
END_ELEMENT
ELEMENT Joe (Item := "SeaMonkeys",
ItemID := 8,
CatalogNumber := "SMKY-1979/A",
Vendor := 77,
END_ELEMENT
以下是目前输出的内容:
Here's the result: ELEMENT Kurt (Item := "BrightLite", ItemID := 29, CatalogNumber := "BTLT-9274", Vendor := 100, END_ELEMENT ELEMENT Mick (Item := "PetRock", ItemID := 36, CatalogNumber := "PTRK-3475/A", Vendor := 82, END_ELEMENT ELEMENT Kurt (Item := "SeaMonkeys", ItemID := 12, CatalogNumber := "SMKY-1978/E", Vendor := 77, END_ELEMENT ELEMENT Joe (Item := "Pong", ItemID := 24, CatalogNumber := "PONG-1482", Vendor := 5, END_ELEMENT ***** Found Shane's SMKY-1978 SeaMonkeys. (counter: 1) ***** ELEMENT Kurt (Item := "Battleship", ItemID := 99, CatalogNumber := "BTLS-5234", Vendor := 529, END_ELEMENT ***** Found Mick's SMKY-1978 SeaMonkeys. (counter: 0) ***** ELEMENT Frank (Item := "PetRock", ItemID := 42, CatalogNumber := "PTRK-3475/B", Vendor := 82, END_ELEMENT ELEMENT Joe (Item := "SeaMonkeys", ItemID := 8, CatalogNumber := "SMKY-1979/A", Vendor := 77, END_ELEMENT
答案 0 :(得分:11)
。+在你的正则表达式开始时是“贪婪的”。这意味着它将匹配最大字符。
你的正则表达式写得更好(它会更可读,更快)
my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT/;
我认为您可以简单地重复此匹配:
sub parseData {
my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT(.+)/;
foreach my $counter (0..$maxInstancesToReplace) {
# Stick in a temporary text placeholder that I will replace later after more processing
$DataToParse =~ s/$re/ ***** Found $1\'s $catNum $item. (counter: $counter) *****$2/s;
}
print("Here's the result:\n$DataToParse\n");
}
如果无法重复,则应使用/ e regex修饰符。
答案 1 :(得分:-1)
最好的解决方案似乎是从数据中获取每个ELEMENT ... END_ELEMENT部分,并且一次只从一个部分获取正则表达式,而不是一次性将整个完整数据集提供给正则表达式。不完全是我想要完成的,但我重写了我的程序来做这个零碎的处理,它就像一个魅力。