我有一个文件,其中的行由以下字段组成:
示例行:
%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x其他,%xx仅一次,%q其他,%z其他,%c cstuff
标记集对于搜索非常重要-这是我的示例标记集:
%t,%u,%v,%w,%x,%xx,%y,%z
我想找到标签在集合中的字段的内容,并在从该集合中标记的后续字段中重复该字段的内容。这是我尝试失败的代码:
def get_next_smallest(data,default=0):
"""
returns the discounted value for all items in a list
discounted value is the next smaller item in the list, e.g.:
for any n, the next smallest item is the first item in data[n+1:] < data[n]
provides O(n) complexity solution.
"""
discounts=[default for i in data] # stores the corresponding next smaller value
stack = [] # initialize our empty stack
for i, this in enumerate(data):
while len(stack) > 0 and this < data[stack[-1]]:
discounts[stack.pop()] = this
stack.append(i)
return discounts
def get_total(data):
init_total = sum(data)
default = 0 # should be a value that will NOT be present in the data, like 0 or -1
discounts = get_next_smallest(data, default)
full = [i for i,v in enumerate(discounts) if v == default]
total = init_total - sum(discounts)
return total, full
我期望:
my $tagmrkr='%';
my $line='%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff';
my $searchtags = qr/t|u|v|w|x|xx|y|z/; # excludes q
print qq/The line:$line\n\n/;
for ($line =~ m/
$tagmrkr$searchtags\ ([^\,]*,)
.*?
$tagmrkr$searchtags\ \1
/gx) {
print qq/First field contents:$1\n/;
print qq/Entire match:$&\n/;
print qq/\n/;
}
我知道了
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:this,
Entire match:%t this,%u that,%v this,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
问题1:
为什么将第一次匹配的The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
和$1
替换为第二次匹配的值?
问题2:-我应该改变什么才能得到我想要的东西(如下)而不是我期望的东西?
我想要的是能够重新旋转比赛,以便即使有重叠也能找到重复的字段-第二场比赛的第一场出现在第一场比赛的第二场之前。实际上,出于我的直接目的,我所需要的只是重复的字段内容。
即,我希望示例中包含3个匹配项:
$&
答案 0 :(得分:3)
提供重叠的一种方法是断言该短语其余部分的存在,并提前行。这样一来,该零件就不再消耗了,引擎就从它之前继续运行,因此它可以再次匹配
use warnings;
use strict;
use feature 'say';
my $s = q(%a astuff,%b bstuff,%t this,%u that,%v this,%t that,)
. q(%x the other,%xx only once,%q the other,%z the other,%c cstuff);
my $m = qr/%/;
my $t = qr/(?:t|u|v|w|x|xx|y|z)/;
while ($s =~ / $m$t \s ([^,]+) , (?=(.*?$m$t\s\g{1},?)) /gx) {
say "capture: $1";
say " whole: $1,$2";
}
打印
capture: this whole: this,%u that,%v this, capture: that whole: that,%v this,%t that, capture: the other whole: the other,%xx only once,%q the other,%z the other,
答案 1 :(得分:0)
在for
循环中使用全局匹配将立即返回所有匹配(然后迭代匹配),因此将match变量设置为最后一次成功匹配(在开始迭代之前),而在一段时间内使用全局正则表达式匹配将在标量上下文中对其进行评估,以使匹配变量在每次迭代中都是正确的。
您可以通过为每次迭代重置pos $line
来获得所有三个匹配项。例如。使用以下方法:
while ($line =~ m/
$tagmrkr$searchtags\ ([^\,]*,)
.*?
$tagmrkr$searchtags\ \1
/gx) {
pos $line = $-[0] + 1;
print qq/First field contents:$1\n/;
print qq/Entire match:$&\n/;
print qq/\n/;
}
输出:
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:this,
Entire match:%t this,%u that,%v this,
First field contents:that,
Entire match:%u that,%v this,%t that,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,