当我使用下面的python代码处理right.txt
和wrong.txt
时,尽管它们看起来完全相同,但wrong.txt
无法运行。这是缩进问题吗?
我的代码在这里:
import re
if __name__ == '__main__':
with open('wrong.txt') as fin:
text = fin.read()
l = [p for p in text.split('\nSentence #') if p]
for p in l:
lines, deps = tuple(p.split('\n\n')[:2])
right.txt:
Sentence #1 (33 tokens):
introduction.
[Text=. CharacterOffsetBegin=208 CharacterOffsetEnd=209 PartOfSpeech=. Lemma=.]
(ROOT
(. .)))
root(ROOT-0, stored-18)
wrong.txt:
Sentence #1 (33 tokens):
introduction.
[Text=. CharacterOffsetBegin=208 CharacterOffsetEnd=209 PartOfSpeech=. Lemma=.]
(ROOT
(. .)))
root(ROOT-0, stored-18)
答案 0 :(得分:0)
我比较了两个txt文件,发现(4111)差异是所有行结尾(新行)。 right.txt使用(0x0a,'\ n'); wrong.txt使用(0x0d0a,'\ r \ n')。
考虑到上述情况,代码可能是这样的:
import re
if __name__ == '__main__':
with open('wrong.txt') as fin:
text = fin.read()
ending = '\r\n' if '\r\n' in text else '\n'
l = [p for p in text.split( ending + 'Sentence #') if p]
for p in l:
lines, deps = tuple(p.split( ending * 2 )[:2])