import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
print ("not coming here")
matches = re.findall(regex,line)
print (matches)
在上面的代码中,我试图捕获重复字符组。
所以例如我需要像这样的答案: 111 222 等
但是当我运行上面的代码时,我得到了这个错误:
Traceback (most recent call last):
File "First.py", line 3, in <module>
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 224, in compile
return _compile(pattern, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 293, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_compile
.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 778, in _parse
p = _parse_sub(source, state)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 524, in _parse
code = _escape(source, this, state)
File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 415, in _escape
len(escape))
sre_constants.error: cannot refer to an open group at position 16
有人请指导我哪里出错。
答案 0 :(得分:2)
你(可能)想要
([a-zA-Z0-9])\1+
<小时/>
在Python
:
import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'([a-zA-Z0-9])\1+')
matches = [match.group(0) for match in regex.finditer(line)]
print (matches)
# ['111', '222']
答案 1 :(得分:2)
在另一个组中找不到组引用。如果您只想打印出那些重复的字符,那么您可以使用re.sub
进行小型黑客攻击:
def foo(m):
print(m.group(0))
return ''
_ = re.sub(r'(\w)\1+', foo, line) # use [a-zA-Z0-9] if you don't want to match underscores
111
222
答案 2 :(得分:1)
可能使用.findall
执行此操作,但使用.finditer
执行此操作更为简单,如Jan&#39}所示。答案。
import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'(([a-zA-Z0-9])\2+)')
matches = [t[0] for t in regex.findall(line)]
print(matches)
<强>输出强>
['111', '222']
我们使用\2
,因为\1
引用外括号中的模式,而\2
引用内括号中的模式。