我写了一个简短的脚本来从一个文件中读取,该文件包含有关博客文章的信息。文件中的每一行对应一篇文章,以标签分隔的列包含文章“id'”标题和段落等信息。
id title paragraph
1 Motorola prototypes from Frog Some cool looking concepts for phones, watches etc
2 Digital everything This new york times article talks about the willingness of consumers
3 E-mails banned at summer camps E-mails compound feelings of homesickness in kids
4 Simple Multimedia Websites/e-mail This is a sort of website/e-mail generation site
5 Campground wi-fi Wi-fi is now on the list of amenities offered at many campgrounds
6 Fog screen Literally, a screen made by projecting onto fog
此代码按' \ n'分割文件。这样每篇文章都是列表中的一个元素:
# Open file and skip first line(headers)
file = open("RBArticlesTabClean.txt", "r", encoding="utf-8")
file.readline()
# Read and decode whole file
articlesFile = htmlcodes.decodeString(file.read()).lower()
# Split file into its lines
articlesFileList = articlesFile.split("\n")
为了测试这是否正常以及程序是否正确读取文件,我遍历获得的文章列表,并打印出整个文件:
for each in articlesFileList:
input(each)
在IDLE中运行时,它按预期工作,每次用户按下回车键时打印出每一行(小写)。
但是,当脚本通过命令提示符运行时,在打印三篇文章后它会失败,并出现此错误:
Traceback (most recent call last):
File "E:\Python\RBTrends\RBTrendsAnalysis.py", line 52, in <module>
print(each)
File "C:\Python34\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 89: character maps to <undefined>
我有两个问题:
1)为什么我收到此错误?
2)为什么在IDLE和命令提示符下运行程序有区别?
答案 0 :(得分:1)
据我所知,IDLE能够显示unicode字符,而命令提示符不能比普通的旧ascii更好。这就是您遇到此错误的原因。