Question

我正在尝试使用正则表达式函数来实现以下目的我的输入看起来像这样：

A:L1, A:K2, A:E3, A:A4, A:E5, A:H7,
,EHKKDH,6,LKEAELH,7

我想编写一个将分隔的正则表达式函数

,EHKKDH,6,LKEAELH,7
to:
,EHKKDH,6,
 (blankline)
 LKEAELH,7

寻找要分离的内容的功能是：

import re
with open ('masterfile.txt', 'r' ) as f:
content = f.read()
y=str(content)
badpattern= re.compile(r'\d,\w')
goodpattern=re.compile(r'\d,\n\w')
x = re.sub(badpattern,goodpattern,y)
print(x)

在替换位置使用goodpattern时出现以下错误。

 File "myprogram.py", line 55, in <module>
x = re.sub(badpattern,goodpattern,y)
File "/Users/Jay/anaconda3/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Users/Jay/anaconda3/lib/python3.7/re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "/Users/Jay/anaconda3/lib/python3.7/re.py", line 300, in 
_compile_repl
return sre_parse.parse_template(repl, pattern)
File "/Users/Jay/anaconda3/lib/python3.7/sre_parse.py", line 954, in parse_template
s = Tokenizer(source)
File "/Users/Jay/anaconda3/lib/python3.7/sre_parse.py", line 228, in __init__
string = str(string, 'latin1')
TypeError: decoding to str: need a bytes-like object, re.Pattern found

我的代码可以正常工作，如果我输入一个字符串'works'，其中goodpattern在哪里，我将得到以下输出：

,EHKKDH,worksKEAELH,7

我需要能够使用正则表达式来进行这些替换。模式将始终为数字，字母

我还将如何附加这些更改以替换原始文件中的匹配项？我了解替换方法。但是，即使查阅了手册，我仍然很难使用re.sub。感谢您的帮助！

Answer 1

我通常使用站点regexr.com来构造这类正则表达式。

无论如何，正如您所说，模式是word , number ,。

将其更改为正则表达式：

word-> \w+或[a-zA-Z]+（\w还将包括数字，[a-zA-Z]仅将字符A与{{1 }}和Z到a。您添加+以匹配至少个字符。）

z-> ,

,-> number

那么最终的正则表达式将为\d+。

Answer 2

re.sub的第二个参数（替换字符串）必须是字符串，而不是另一个正则表达式。在您的情况下，应在要插入换行符的位置之前和之后使用捕获组（如果需要空白行，则在两个位置之前或之后），以便可以使用后向引用在替换字符串中引用它们：

x = re.sub(r'(\d,)(\w)', r'\1\n\2', y)

使用正则表达式代替正则表达式功能吗？

2 个答案: