注意：answer.txt包含以下行

Question

我正在python中读取一个文件并将文件拆分为＆＃39; \ n＆＃39; 。当我打印拆分清单时，它正在给予“玛格丽特”，“玛丽”，“玛丽”等。而不是玛丽＆＃39;

这是我的代码......

with open('/home/naveen/Desktop/answer.txt') as ans:
    content = ans.read()
content = content.split('\n')
print content

注意：answer.txt包含以下行

Magni fcent Mary

飞行锡克教徒

Payyoli Express

这是我的程序输出

Here is my output of the program

Answer 1

问题出在您的文本文件中。在＆＃34; Magn i fi c ent Mary＆＃34;中有一些unicodes字符。如果你确定你的程序应该工作。如果要使用unicodes字符读取，则必须将文本正确解码为UTF-8。

看看这个（假设您要使用python 2）Backporting Python 3 open(encoding="utf-8") to Python 2

python2

with codecs.open(filename='/Users/emily/Desktop/answers.txt', mode='rb', encoding='UTF-8') as ans:
  content = ans.read().splitlines()
  for i in content: print i

如果你可以使用python3，你实际上可以这样做：

with open('/home/naveen/Desktop/answer.txt', encoding='UTF-8') as ans:
  content = ans.read().splitlines()
print(content)

Answer 2

您的“f＆＃39;在Magni fi cent Mary。这不是正常的f，但它是 LATIN SMALL LIGATURE FI。您只需删除自己的＆＃39; f＆＃39;并在gedit中重新键入它。要验证差异，只需包含

print [(ord(a),a) for  a in (file.split("\n"))[0]]

代码末尾的fs。

如果无法编辑文件，可以先将字符串转换为unicode，然后使用python的unicodedata。

import unicodedata
file  = open("answer.txt")
file = (file.read()).decode('utf-8')
print unicodedata.normalize('NFKD', 
file).encode('ascii','ignore').split("\n")

python文件读取和分割单词

注意：answer.txt包含以下行

2 个答案: