我有一个MySQL表,XML内容存储在longtext字段中,编码为utf8mb4_general_ci
数据库表
我想使用Python脚本从transcript字段读取XML数据,修改元素,然后将值写回数据库。
当我尝试使用ElementTree.tostring
将XML内容转换为元素时,我收到以下编码错误:
Traceback (most recent call last):
File "ImageProcessing.py", line 33,
in <module> root = etree.fromstring(row[1])
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300,
in XML parser.feed(text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etre e/ElementTree.py", line 1640,
in feed self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 9568: ordinal not in range(128)
代码:
import datetime
import mysql.connector
import xml.etree.ElementTree as etree
# Creates the config parameters, connects
# to the database and creates a cursor
config = {
'user': 'username',
'password': 'password',
'host': '127.0.0.1',
'database': 'dbname',
'raise_on_warnings': True,
'use_unicode': True,
'charset': 'utf8',
}
cnx = mysql.connector.connect(**config)
cursor = cnx.cursor()
# Structures the SQL query
query = ("SELECT * FROM transcription")
# Executes the query and fetches the first row
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
print(row[0])
#Some of the things I have tried to resolve the encoding issue
#parser = etree.XMLParser(encoding="utf-8")
#root = etree.fromstring(row[1], parser=parser)
#row[1].encode('ascii', 'ignore')
#Line where the encoding error is being thrown
root = etree.fromstring(row[1])
for img in root.iter('img'):
refno = img.text
img.attrib['href']='http://www.link.com/images.jsp?doc=' + refno
print img.tag, img.attrib, img.text
row = cursor.fetchone()
cursor.close()
cnx.close()
答案 0 :(得分:0)
你已经完成了所有设置并且你的数据库连接正在返回Unicodes,这是一件好事。
不幸的是,ElementTree的fromstring()
需要字节str而不是Unicode。这是因为ElementTree可以使用XML标头中定义的编码对其进行解码。
你需要改用它:
utf_8_xml = row[1].encode("utf-8")
root = etree.fromstring(utf_8_xml)