UnicodeEncodeError:'charmap'编解码器无法编码characterb(Python错误)

时间:2017-03-10 12:18:27

标签: python

我想编写一个简单的python程序来查找文件中最常用的单词。 我的文件内容如下所示:

<text>աաա բբբ գգգ աաաա բբբ</text>
.....
<text>բբբ աաագգգ աաաա բբբ</text>
.....
<text>աաաաաաա բբբ հհհհ բբբ գգգ </text>

这是我的Python代码:

# -*- coding: utf-8 -*-
import re
import collections
a = open('dump.txt', encoding='UTF-8', errors='replace')
contents = a.read()
articlelist = re.findall(r'<text[^>]+>([^<]+)</text>', contents, re.M)
wordsandnumber = []
for article in articlelist:
    wordsinarticle = re.findall(r'\w+', article)
    for finaly in wordsinarticle:
        wordsandnumber.append(finaly)
counter = collections.Counter(wordsandnumber)
mylist = counter.most_common()
open('as.txt', 'w').write('\n'.join('%s %s' % x for x in mylist))
print(counter.most_common())

但由于此错误,代码无法正常运行:

Traceback (most recent call last):
  File "C:\Users\Home\Downloads\test.py", line 14, in <module>
    open('as.txt', 'w').write('\n'.join('%s %s' % x for x in mylist))
  File "C:\Users\Home\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0567' in position 0: character maps to <undefined>

我是编程的初学者./请帮我解决这个问题并理解为什么这段代码不起作用。

如果它很重要:我使用的是Windows 10和Python 3.5。

0 个答案:

没有答案