Question

我正在使用邮递员发送我正在龙卷风中阅读的excel文件。

龙卷风代码

self.request.files['1'][0]['body'].decode()

如果我发送 .csv ，则上述代码有效。

如果我发送 .xlsx 文件，我会遇到此错误。

UnicodeDecodeError：＆＃39; utf-8＆＃39;编解码器不能解码位置10中的字节0x87：无效的起始字节

request.files将获取文件，但类型将为byte。所以要将字节转换为str我使用的是decode（），它只适用于.csv而不适用于.xlsx

我尝试解码（＆＃39; utf-8＆＃39;）但仍然没有运气。

我试过搜索但是没有发现任何提及0x87问题的问题？

Answer 1

原因是.xlsx文件的编码不同，而不是utf-8。您需要使用原始编码来解码文件。

无法保证以编程方式查找文件的编码。我猜你正在为普通用户制作这个应用程序，所以你会遇到不同和意想不到的编码文件。

处理此问题的一个好方法是尝试使用多种编码进行解码，以防万一发生故障。例如：

encodings = ['utf-8', 'iso-8859-1', 'windows-1251', 'windows-1252']

for encoding in encodings:
    try:
        decoded_file = self.request.files['1'][0]['body'].decode(encoding)
    except UnicodeDecodeError:
        # this will run when the current encoding fails
        # just ignore the error and try the next one
        pass
    else:
        # this will run when an encoding passes
        # break the loop
        # it is also a good idea to re-encode the 
        # decoded files to utf-8 for your purpose
        decoded_file = decoded_file.encode("utf8")
        break
else:
    # this will run when the for loop ends
    # without successfully decoding the file
    # now you can return an error message
    # to the user asking them to change 
    # the file encoding and re upload
    self.write("Error: Unidentified file encoding. Re-upload with UTF-8 encoding")
    return

# when the program reaches here, it means 
# you have successfully decoded the file 
# and you can access it from `decoded_file` variable

以下是一些常见编码的列表：What is the most common encoding of each language?

Answer 2

根据here提供的建议尝试此操作：

self.request.files['1'][0]['body'].decode('iso-8859-1').encode('utf-8')

在Tornado中读取excel时获取UnicodeDecodeError，Python

龙卷风代码

2 个答案: