Question

我正在使用urllib.request.urlopen从Web API获取* .srt文件。（相关）代码（Python 3.x）：

with urllib.request.urlopen(req) as response:
    result = response.read().decode('utf-8')
    print(result)

    with open(subpath, 'w') as file:

        file.write(result)
        file.close()

除了一些文件外，这种方法很好。对于某些文件，我收到以下错误： UnicodeEncodeError: 'charmap' codec can't encode character '\u266a' in position 37983: character maps to <undefined>

（\ u266a是四分音符号。）

如何解决这个问题？我可以从.read（）返回的bytes对象中过滤掉这个字符吗？或者我可以忽略编码错误吗？提前谢谢。

另外，请注意我确实找到了许多关于'...无法编码字符......'的错误主题，但是，在大多数情况下使用.decode（'utf-8'）是解决方案。

Answer 1

我无法解决解码错误，但是，我找到了解决方法。

通过以二进制模式写入文件，可以写入bytes对象，因此不需要解码：

with urllib.request.urlopen(req) as response:
    result = response.read()
    # print(result)

    with open(subpath, 'wb') as file:

        file.write(result)
        file.close()

“'charmap'编解码器不能编码字符”（Http Request）

1 个答案: