Question

我试图区分Linux / Unix行尾字符\n和行结束行\r\n。我似乎无法找到一个独特的正则表达式字符串来区分这两种情况。我的代码是

import regex 

winpattern = regex.compile("[(?m)[\r][\n]$",regex.DEBUG|regex.MULTILINE)

linuxpattern = regex.compile("^*.[^\r][\n]$", regex.DEBUG)

for i, line in enumerate(open('file8.py')):
    for match in regex.finditer(linuxpattern, line):
        print 'Found on line %s: %s' % (i+1, match.groups())

winpattern和linuxpattern匹配Windows和Linux。我希望linuxpattern仅匹配Linux EOL和winpattern以匹配Windows EOL。有什么建议？

Answer 1

将文件作为文本文件打开时默认情况下，Python使用通用换行模式（请参阅PEP 278），这意味着它会转换所有三个换行符类型\r\n，\r和{{1只是\n。这意味着您的正则表达式无关紧要：您在阅读文件时已经丢失了有关换行符类型的信息。

要停用换行转换，您应将\n参数传递给open（使用io.open for python＆lt; 3）：

newline=''

之后这些正则表达式将起作用：

$ echo 'Hello
> World
> ' > test.unix
$ cp test.unix test.dos
$ unix2dos test.dos
unix2dos: converting file test.dos to DOS format...
$ python3
Python 3.5.3 (default, Nov 23 2017, 11:34:05) 
[GCC 6.3.0 20170406] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> unix = open('test.unix', newline='').read()
>>> dos = open('test.dos', newline='').read()
>>> unix
'Hello\nWorld\n\n'
>>> dos
'Hello\r\nWorld\r\n\r\n'

请注意，>>> import re >>> winregex = re.compile(r'\r\n') >>> unixregex = re.compile(r'[^\r]\n') >>> winregex.findall(unix) [] >>> winregex.findall(dos) ['\r\n', '\r\n', '\r\n'] >>> unixregex.findall(unix) ['o\n', 'd\n'] >>> unixregex.findall(dos) []在使用$时在换行符之前匹配，而在没有它的情况下只匹配字符串的结尾。要正确匹配任何换行符，您只需删除re.MULTILINE。

如果你想要一个匹配完整行的正则表达式，请使用以下内容：

$

正则表达式区分Windows和Linux行尾字符

1 个答案: