Question

我正在使用sub()函数将===Something here===替换为<h2>Something here</h2>。

以下是有效的：

line = sub(r"(===)([a-zA-Z\s]*)(===)", r"<h2>\2</h2>", line)

原始内容为：

===Something here===

但是，当原始内容为：

时，它无效

===
Something here
===

我尝试过这样的事情：

line = sub(r"(===\n)([a-zA-Z\s]*)(===)", r"<h2>\2</h2>", line)

（唯一的变化是在第一组中添加\n）

但我认为强制该模式包含一个新行，而不是该模式的可选部分。

如何扩展当前模式，使其足够灵活，以识别可能存在新行的实例？

修改

我已经尝试了下面的建议（在撰写本文时）并且它们无法正常工作。我能想到的唯一原因是因为行中可能还有其他字符。

以下图片是原始文本文件的屏幕截图（在SciTE编辑器中打开，其中“End of Line”和“Whitespace”设置为显示）正在阅读：

original_text_file = open('file.txt', 'U')

单行实例：

enter image description here

多行实例：

enter image description here

我不知道是否需要为这些角色做出其他考虑？

编辑二：

测试下面的另一个解决方案的结果（这不会在多行实例上执行替换）：

python代码：

from re import *

def test_function(text_file):
    file_object = open(text_file+'.txt', 'U')
    for line in file_object:
        line = sub(r"\n?(===)\n?([a-zA-Z\s]*?)\n?(===)\n?\n?", r"<h2>\2</h2>", line)
        print line

test_function('my_file')

my_file.txt：

===Something here===

Lorem ipsum lala.  

===
Something here
===

Loreum ipsum lala.

Answer 1

我认为在这里使用正则表达式是合适的。你的表达接近于你所需要的。在\n之后，您需要?匹配0或1之前的角色，在这种情况下为0或1 \n。这必须放在几个地方，以应对可能的新线。您还必须使目标捕获组中的\s不使用可选的\n，否则您最终会在输出中使用\n。

import re
pat = r'\n?(===)\n?([a-zA-Z\s]*?)\n?(===)\n?\n?'
rep = r'<h2>\2</h2>'

print(repr(re.sub(pat,rep,"""
=== Something here ===
""")))
print(repr(re.sub(pat,rep,"""===
Something here
===""")))

输出

>>> 
'<h2> Something here </h2>'
'<h2>Something here</h2>'

我从像你这样的Scite中复制并粘贴了文字：

enter image description here

对于多线我会建议：

import re
patSearch = r'\n?===\n?[a-zA-Z\s]*?\n?==='
patReplace = r'\n?(===)\n?([a-zA-Z\s]*?)\n?(===)\n?\n?'
replacement  = r'<h2>\2</h2>'

使用字符串t：

t="""===Something here===

Lorem ipsum lala.  

===
Something here
===

Loreum ipsum lala."""

以下

matches = re.findall(patSearch,t) #get all the === ... === style string
for match in matches:
    print re.sub(patReplace,replacement,match) #do replacement in each one

会产生

>>> 
<h2>Something here</h2>
<h2>Something here</h2>

Answer 2

我建议这个解决方案：

import re
s = """===Something here===

Lorem ipsum lala.  

===
Something here
===

Loreum ipsum lala.  """
result = re.sub(r"===(.*?)===", r"<h2>\1</h2>", s, flags=re.DOTALL)
print result

一些解释：

.*?匹配“非贪婪”模式中的任何字符：匹配尽可能少的数据。这是为了避免===First=== lalala ===Second===取代<h2>First=== lalala ===Second</h2>
flags=re.DOTALL表示.匹配任何字符，包括换行符

请注意，您需要将sub（）应用于整个文件，而不是逐行

Answer 3

使用标记re.DOTALL编译正则表达式：这将使字符.也匹配换行符。 $应该用于强制结束模式。您不必再在Blender的解决方案中使用\s了。

Answer 4

在捕获组之间添加空格：

re.sub(r"(===)\s*([a-zA-Z\s]*?)\s*(===)", r"<h2>\2</h2>", line)

您还可以使用非贪婪的捕获组：

re.sub(r"(===)\s*(.*?)\s*(===)", r"<h2>\2</h2>", line)

Answer 5

User1063287，如果您仍有问题，我建议Zac发布解决方案。我遇到了类似的问题，“re.DOTALL”标志是允许我按照我的意图进行替换的技巧。我的问题还涉及从.txt文件访问文本。这里有一个关于如何根据对我有用的代码编写特定问题的建议（注意我将输出保存到新的.txt）

import re
with open('output.txt', "w") as o:
    with open('input', 'r') as i:
        line = i.read()
        line = re.sub(r"===.*?===", r"<h2>\2</h2>", line, flags=re.DOTALL)      
    o.write(line)

with语句将允许您的输入和输出文件在循环完成时关闭，而i.read（）命令允许立即读取整个文件（而不是逐行访问）。我不明白为什么你不能把这个代码放到def函数中，但我还没有尝试过它。

祝你好运！

如何使用sub（）匹配多行模式？

5 个答案: