执行以下代码时:
import mysql.connector
connection = mysql.connector.connect(...) # connection params here
cursor = connection.cursor()
cursor.execute('create table test_table(value blob)')
cursor.execute('insert into test_table values (_binary %s)', (np.random.sample(10000).astype('float').tobytes(),))
cursor.execute('select * from test_table')
cursor.fetchall()
我收到以下错误:
UnicodeDecodeError:“ utf-8”编解码器无法解码位置中的字节0xf7 1:无效的起始字节
(...然后是我认为没有用的堆栈跟踪)
似乎mysql连接器将我的Blob转换为字符串(并且没有这样做)。如何在不进行任何转换的情况下以字节为单位获取此数据?
答案 0 :(得分:4)
我们遇到了同样的问题,即BLOB在MySQL 8.0.13,mysql-connector-python 8.0.13和sqlalchemy 1.2.14中被误读为UTF-8字符串。
启用我们的诀窍是启用use_pure
option of MySQL Connector。 use_pure
的默认值已在8.0.11中更改,新的默认值是使用C扩展名。因此,我们取消了该选项:
create_engine(uri, connect_args={'use_pure': True}, ...)
我们的错误和堆栈跟踪的详细信息:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
....
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 272, in execute
self._handle_result(result)
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 163, in _handle_result
self._handle_resultset()
File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 651, in _handle_resultset
self._rows = self._cnx.get_rows()[0]
File "/usr/local/lib/python3.6/site-packages/mysql/connector/connection_cext.py", line 273, in get_rows
row = self._cmysql.fetch_row()
SystemError: <built-in method fetch_row of _mysql_connector.MySQL object at 0x5627dcfdf9f0> returned a result with an error set
答案 1 :(得分:1)
另一种方法是在连接初始化时使用raw=True
参数:
connection = mysql.connector.connect(
host="localhost",
user="user",
password="password",
database="database",
raw=True
)
答案 2 :(得分:0)
显然,这是Python'mysql'模块的已知问题。尝试改用“ pymysql”。
答案 3 :(得分:0)
Traceback (most recent call last):
File "demo.py", line 16, in <module>
cursor.execute(query, ())
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte '0xff ... '
in position 0: invalid start byte
使用版本:
$ python --version
Python 2.7.10
>>> mysql.connector.__version__
'8.0.15'
使用python代码
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mysql.connector
conn = mysql.connector.connect(
user='asdf',
password='asdf',
host='1.2.3.4',
database='the_db',
connect_timeout=10)
cursor = conn.cursor(buffered=True) #error is raised here
try:
query = ("SELECT data_blob FROM blog.cmd_table")
cursor.execute(query, ())
except mysql.connector.Error as err: #error is caught here
#error is caught here, and printed:
print(err) #printed thustly
使用由python的open(
填充的python变量“原始字节二进制”:
def read_file_as_blob(filename):
#r stands for read
#b stands for binary
with open(filename, 'rb') as f:
data = f.read()
return data
所以问题出在文件中数据的编码转换-> mysql blob的数据编码->与mysql如何举起该blob并将其转换回utf-8之间。
解决方案1 与AHalvar所说的完全一样,设置use_pure=True
参数并传递给mysql.connector.connect( ... )
。然后,它神秘地起作用了。但是优秀的程序员会注意到,顺从神秘的咒语是一种不好的代码味道。布朗运动修正导致技术债务。
解决方案2 是对您的数据进行早期编码和经常编码,并防止引起这些问题的双重重新编码和双重数据解码。尽快将其锁定为通用编码格式。
对我而言,令人满意的解决方案是在此过程的早期强制进行utf-8编码。到处都执行UTF-8。
data.encode('UTF-8')
Unicode大堆的poo代表了我对不同操作系统和编码方案上的各种设备之间的字符编码的这种看护。