Question

我正在解析一个文件的行，我想删除＆＃34; {％＆＃34;和＆＃34;％}＆＃34;，因为它们代表评论。

更具体地说，是一个字符串，如

bla{% comment %} bli {% useless %}blu

应该返回

bla bli blu

我尝试使用正则表达式，删除了{% .* %}匹配的所有内容：

import re
s = 'bla{% comment %} bli {% useless %}blu'
regexp = '{% .* %}'
comments = re.findall(regexp, s)
for comment in comments:
    s = s.replace(comment, '')
print s

这会blablu并删除bli。虽然我理解为什么它会像那样，但我不知道如何获得blabliblu。

Answer 1

您需要.*?。你的点是greedy。

regexp = '{% .*? %}'

当运营商贪婪时，尽可能多地使用＆＃34;＆＃34;仍然会产生匹配，这意味着它从第一个{%到最后一个%}

bla{% comment %} bli {% useless %}blu
   ^ here        ...            ^ to here

当操作员懒惰时，尽可能少＆＃34;＆＃34;并且仍然会产生匹配，这意味着它会从{%转到下一个 %}。

最好不要显式添加空格，因为模式不匹配没有空格的注释：

regexp = '{%.*?%}'

Answer 2

您应该使用re.sub()并使正则表达式非贪婪添加?。

import re
s = 'bla{% comment %} bli {% useless %}blu'
regexp = '{% .*? %}'
s = re.sub(regexp, "", s)
print(s) # bla bli blu

Answer 3

这只是解释，因为长度是答案！

懒惰替代（不使用点。）

{% [^\W]+ %}       
{% [^\W]* %}
{% [^\W]+? %}
{% [^\W]*? %}
{% [\w]+ %}

懒惰变化（不使用星号）

{% .+? %}

删除表单的格式＆＃34; {％...％}＆＃34;在一个字符串中

3 个答案: