Question

当我使用下面的python代码处理right.txt和wrong.txt时，尽管它们看起来完全相同，但wrong.txt无法运行。这是缩进问题吗？

我的代码在这里：

import re
if __name__ == '__main__':
    with open('wrong.txt') as fin:
        text = fin.read()
    l = [p for p in text.split('\nSentence #') if p]
    for p in l:
        lines, deps = tuple(p.split('\n\n')[:2])

right.txt：

Sentence #1 (33 tokens):
introduction.
[Text=. CharacterOffsetBegin=208 CharacterOffsetEnd=209 PartOfSpeech=. Lemma=.] 
(ROOT

   (. .)))

root(ROOT-0, stored-18)

wrong.txt：

Sentence #1 (33 tokens):
introduction.
[Text=. CharacterOffsetBegin=208 CharacterOffsetEnd=209 PartOfSpeech=. Lemma=.] 
(ROOT

    (. .)))

root(ROOT-0, stored-18)

Answer 1

我比较了两个txt文件，发现（4111）差异是所有行结尾（新行）。 right.txt使用（0x0a，'\ n'）; wrong.txt使用（0x0d0a，'\ r \ n'）。

考虑到上述情况，代码可能是这样的：

import re
if __name__ == '__main__':
    with open('wrong.txt') as fin:
        text = fin.read()
    ending = '\r\n' if '\r\n' in text else '\n'
    l = [p for p in text.split( ending + 'Sentence #') if p]
    for p in l:
        lines, deps = tuple(p.split( ending * 2 )[:2])

python错误的文件格式导致解包错误

1 个答案: