Question

我想使用以下代码仅捕获以两个星号结尾的行：

import re
total_lines = 0
processed_lines = 0
regexp = re.compile(r'[*][\s]+[*]$')


for line in open('testfile.txt', 'r'):
    total_lines += 1
    if regexp.search(line):
        print'Line not parsed. Format not defined yet'
    else:
        processed_lines += 1
print "Total lines: {} - Processed lines: {}".format(total_lines, processed_lines)

在Windows中运行正常。但是当我在CentOS中使用代码时，正则表达式不起作用。这是testfile.txt（40行文件）

的输出

Windows re.__version__ = '2.2.1'：

Line not parsed. Format not defined yet
Line not parsed. Format not defined yet
Line not parsed. Format not defined yet
Line not parsed. Format not defined yet
Line not parsed. Format not defined yet
Total lines: 40 - Processed lines: 35

Linux re.__version__='2.2.1'：

Total lines: 40 - Processed lines: 40

两个操作系统都使用相同的python版本。您可以找到testfile.txt here和here：

Answer 1

以通用换行模式rU打开文件，以支持在python 2.x中具有换行格式不是平台上的本机格式的文件的I / O，然后你的正则表达式中的$匹配EOL。

import re
total_lines = 0
processed_lines = 0
regexp = re.compile(r'[*][\s]+[*]$')    

for line in open('testfile.txt', 'rU'):
    total_lines += 1
    if regexp.search(line):
        print'Line not parsed. Format not defined yet'
    else:
        processed_lines += 1
print "Total lines: {} - Processed lines: {}".format(total_lines, processed_lines)

PEP278解释了rU代表什么：

在具有通用换行符支持的Python中，打开（）模式参数也可以是“U”，意思是“打开输入作为具有通用性的文本文件新线解释“。模式”rU“也是允许的，用于对称 “RB”。

Answer 2

您提供的测试文件中是否包含以两个星号结尾的任何行？

此正则表达式应匹配以两个星号结尾的所有行：

。* \ * {2} $

正则表达式结果的差异Windows Vs Linux（CentOS）

2 个答案: