Question

我有一个带有utf-8字符（名称）的源文件
我有一个具有相同字符编码的文件。
我正在使用html页面，粘贴并剪切有用的内容我要提供的信息。
我在“friendsNames”txt文件中使用了“éáűúőóüöäđĐ”字符。

我给出了这个错误：

Traceback (most recent call last):
  File "C:\Users\Rendszergazda\workspace\achievements\hiba.py", line 9, in <module>
    s = str(urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf-8')
  File "C:\Python27\lib\encodings\cp1250.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to <undefined>

你怎么看？我的问题是什么？

from urllib import urlopen
import codecs

result = codecs.open("C:\Users\Desktop\Achievements\Result.txt", "a", "utf-8")
fh = codecs.open("C:\Users\Desktop\Achievements\FriendsNames.txt", "r", "utf-8")
line = fh.readline()
names = line.split(" ")
fh.close()

s = urlopen("http://eu.battle.net/wow/en/character/arathor/"+str(names[0])+"/achievement").read(), encoding='utf8')
result.write(str(s))
result.close()

Answer 1

您遇到的问题是您正在调用str(array[0])，其中array[0]是一个unicode字符串。这意味着它将以默认编码进行编码，出于某种原因，这种编码似乎是cp1250。（你是否混淆了sys.setdefaultencoding()？不要这样做。）

要从unicode中获取字节串，您应该明确编码 unicode。不要只打电话给str()。使用结果应该具有的编码对其进行编码（在URL的情况下有点难以猜测，但在这种情况下可能是UTF-8。）因此，使用`array [0] .encode（'utf-8' ）”。您可能还需要在URL中引用非ASCII字符，但这取决于远程端期望的内容。

Python - 字符编码和解码问题

1 个答案: