Question

虽然此问题类似于this thread

我认为在使用正则表达式构造代码时，我可能做错了。

我希望将一行中的任何内容与注释（“＃”）或行尾（如果没有注释）匹配。

我使用的正则表达式是：(.*)(#|$)

(.*) =一切都是 (#|$) =评论或行尾

代码：

OPTION = re.compile(r'(?P<value>.*)(#|$)')
file = open('file.txt')
lines = file.read()
for line in lines.split('\n'):
    get_match = OPTION.match(line)
    if get_match:
        line_value = get_match.group('value')
        print "Match=  %s" % line_value

上述工作但不会删除评论。如果文件有如下行：

this is a line   # and this is a comment

运行代码时我仍然得到整行。

我是否在正则表达式中遗漏了其他值/信息，或者我是否需要对代码进行更改？

Answer 1

*是贪婪的（消耗掉尽可能多的字符串），因此消耗整行（超过＃和行尾）。改成 ”。*？”它会起作用。

有关详细信息，请参阅Regular Expression HOWTO。

Answer 2

这是正确的正则表达式，可以执行以下操作：

([^#]*)(#.*)?

另外，你为什么不使用

file = open('file.txt')
for line in file:

Answer 3

@Can，@ Benji和@ΤΖΩΤΖΙΟΥ给出了三个出色的解决方案，让他们看看他们匹配的速度是多么有趣（这就是timeit的用途 - 有趣无意义的微基准测试;-)。 E.g：

$ python -mtimeit -s'import re; r=re.compile(r"([^#]*)(#.*)?"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 2.02 usec per loop

VS

$ python -mtimeit -s'import re; r=re.compile(r"^(.*?)(?:#|$)"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 4.19 usec per loop

VS

$ python -mtimeit -s'import re; r=re.compile(r"(.*?)(#|$)"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
100000 loops, best of 3: 4.37 usec per loop

并且获胜者是......混合的模式！ - ）

$ python -mtimeit -s'import re; r=re.compile(r"(.*?)(#.*)?"); s="this is a line   # and this is a comment"' 'm=r.match(s); g=m.group(1)'
1000000 loops, best of 3: 1.73 usec per loop

免责声明：当然，如果这是一个真正的基准测试练习并且速度确实很重要，那么可以尝试s的许多不同且相关的值，超出这样的微基准测试等等的测试。但是，我仍然发现timeit取之不尽的乐趣！ - ）

Answer 4

使用此正则表达式：

^(.*?)(?:#|$)

使用非贪婪修饰符（?）时，.*表达式将与很快匹配，因为已达到井号或行尾。默认情况下尽可能匹配多，这就是为什么你总是得到整条线。

表达到评论或行尾

4 个答案: