Question

我试图让biopython的基因组图功能起作用，但它目前失败了。这是输出，我不确定错误的含义。有什么建议吗？

======================================================================
ERROR: test_partial_diagram (test_GenomeDiagram.DiagramTest)
construct and draw SVG and PDF for just part of a SeqRecord.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_GenomeDiagram.py", line 662, in test_partial_diagram
assert open(output_filename).read().replace("\r\n", "\n") \
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 11: invalid start byte

Answer 1

您的数据文件由字节组成，这些字节的编码是除utf-8以外的某些编码。您需要指定正确的编码。

 open(output_filename, encoding=...)

我们没有完全可靠的方法告诉您它应该是什么编码。但是从那以后

In [156]: print('\x93'.decode('cp1252'))
“

（因为引号是一个非常常见的字符），您可能想尝试使用

open(output_filename, encoding='cp1252')

在test_GenomeDiagram.py的第662行。

Answer 2

UTF-8是一个可变字节编码。在正在编码需要多个字节的字符的情况下，第二个后续字节的格式为10xxxxxx，并且没有初始字节（或单字节字符）具有此形式。因此，0x93不能是UTF-8字符的第一个字节。错误消息告诉您缓冲区包含无效的UTF-8字节序列。

基因组图失败：Unicode解码错误

2 个答案: