Question

我处理其他人生成的文本文件。这些文件的行由0xA字符分隔，但偶尔的行中会抛出0xD。这是我如何阅读文件：

for i, line in enumerate(open(file_path, "r", newline=chr(10))):
   ...

看起来，即使我告诉open使用0xA作为行分隔符，它仍然会被导致它解析不完整行的杂散0xD混淆。我错过了什么？

（在Windows上进行处理）

Answer 1

它似乎按预期工作（Python 3.5）：

>>> f = open('test.txt', 'wb') # write in binary mode so nothing is changed
>>> f.write('both\r\nnewline\ncarriagereturn\rbothagain\r\n'.encode('utf-8'))
40    
>>> f.close()

>>> open('test.txt', 'rb').read() # confirm data is intact
>>> b'both\r\nnewline\ncarriagereturn\rbothagain\r\n'

>>> list(open('test.txt', 'r', newline=None)) # universal mode (convert everything to '\n')
['both\n', 'newline\n', 'carriagereturn\n', 'bothagain\n']

>>> list(open('test.txt', 'r', newline='')) # universal mode but leave data unchanged
['both\r\n', 'newline\n', 'carriagereturn\r', 'bothagain\r\n']

>>> list(open('test.txt', 'r', newline='\n')) # split only on '\n'
['both\r\n', 'newline\n', 'carriagereturn\rbothagain\r\n']

>>> list(open('test.txt', 'r', newline='\r')) # split only on '\r'
['both\r', '\nnewline\ncarriagereturn\r', 'bothagain\r', '\n']

>>> list(open('test.txt', 'r', newline='\r\n')) # split only on '\r\n'
['both\r\n', 'newline\ncarriagereturn\rbothagain\r\n']

你能发布一些样本数据吗？验证码？

Answer 2

您可以手动分割线吗？

for i, line in enumerate(open(file_path, "r").read().split('\n')):
    ...

用stray \ r \ n字符逐行读取文件

2 个答案: