我有一个家庭作业问题,我需要使用正则表达式从大字符串中解析出子字符串。
目标是选择与以下参数匹配的子字符串:
子字符串以相同的大写字符开头和结尾,我需要忽略前面带有数字0的任何大写字符实例。
例如,ZAp0ZuZAuX0AZA
将包含匹配项ZAp0ZuZ
和AuX0AZA
我已经弄了几个小时,说实话还没来得及...
我已经尝试了一些类似下面的代码的方法,但是它将选择从第一个大写字母到最后一个大写字母的所有内容。我也
[A-Z]{1}[[:alnum:]]*[A-Z]{1} <--- this selects the whole string
[A-Z]{1}[[:alnum:]][A-Z]{1} <--- this gives me strings like ZuZ, AuX
真的很感谢您的帮助,对此我深感困惑。
答案 0 :(得分:1)
这可能有效
(?<!0)([A-Z]).*?(?<!0)\1
https://regex101.com/r/nES9FP/1
解释
(?<! 0 ) # Ignore Upper case with zero in front of it
( [A-Z] ) # (1), This Upper case is to be found down stream
.*? # Lazy, any character
(?<! 0 ) # Ignore Upper case with zero in front of it
\1 # Backref to what is in group (1)
答案 1 :(得分:0)
您可以使用
(?<!0)([A-Z]).*?(?<!0)\1
请参见regex demo。
详细信息
(?<!0)([A-Z])
-第1组:一个不以零开头的ASCII大写字母.*?
-除换行符外,任何字符都应尽可能少(?<!0)\1
-与第1组相同的字母,紧随0
之后。请参见Python demo:
import re
s="ZAp0ZuZAuX0AZA"
for m in re.finditer(r'(?<!0)([A-Z]).*?(?<!0)\1', s):
print(m.group()) # => ['ZAp0ZuZ', 'AuX0AZA']
答案 2 :(得分:0)
使用正则表达式执行此操作可能不是最好的主意,因为您可以拆分它们。但是,如果您希望这样做,this expression可能会告诉您,当字符列表扩展时,您可能会遇到什么问题:
(?=.[A-Z])([A-Z])(.*?)\1
我添加了必须包含一个大写字母的(?=.[A-Z])
。您可以删除它,它将起作用。但是,为了安全起见,可以在表达式中添加此类边界。
const regex = /([A-Z])(.*?)\1/gm;
const str = `ZAp0ZuZAuX0AZA
ZApxxZuZAuXxafaAZA
ZApxaf09xZuZAuX090xafaAZA
abcZApxaf09xZuZAuX090xafaAZA`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([A-Z])(.*?)\1"
test_str = ("ZAp0ZuZAuX0AZA\n"
"ZApxxZuZAuXxafaAZA\n"
"ZApxaf09xZuZAuX090xafaAZA\n"
"abcZApxaf09xZuZAuX090xafaAZA")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.