用Unicode字符导出的Pandas DataFrame到MySQL

时间:2017-08-10 16:13:20

标签: mysql python-3.x pandas unicode sqlalchemy

我一直在尝试使用driver将大型pandas数据框导出到MySQL数据库,但数据框在某些列中包含unicode字符,其中一些在导出期间会导致警告并转换为package demo; import org.openqa.selenium.WebDriver; import org.openqa.selenium.firefox.FirefoxDriver; import org.openqa.selenium.remote.DesiredCapabilities; public class Driver_Close_Initiate { static WebDriver driver; public static void main(String[] args) { System.setProperty("webdriver.gecko.driver", "C:\\Utility\\BrowserDrivers\\geckodriver.exe"); DesiredCapabilities dc = DesiredCapabilities.firefox(); dc.setCapability("marionette", true); driver = new FirefoxDriver(dc); driver.get("https://google.com"); driver.close(); driver = new FirefoxDriver(dc); driver.get("https://facebook.com"); driver.quit(); } }

我设法用这个例子重现了这个问题(删除了数据库登录):

DataFrame.to_sql

数据框的前两行应该相同,只是定义不同。

我收到以下警告,有问题的字符(?import pandas as pd import sqlalchemy import pymysql engine = sqlalchemy.create_engine('mysql+pymysql://{}:{}@{}/{}?charset=utf8'.format(*login_info), encoding='utf-8') df_test = pd.DataFrame([[u'\u010daj',2], \ ['čaj',2], \ ['špenát',4], \ ['květák',7], \ ['kuře',1]], \ columns = ['a','b']) df_test.to_sql('test', engine, if_exists = 'replace', index = False, dtype={'a': sqlalchemy.types.UnicodeText()}) č)呈现为ě

ř

生成的数据库表?如下所示:

/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 1")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x8Daj' for column 'a' at row 2")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC4\\x9Bt\\xC3\\xA1k' for column 'a' at row 4")
  result = self._query(query)
/usr/local/lib/python3.6/site-packages/pymysql/cursors.py:166: Warning: (1366, "Incorrect string value: '\\xC5\\x99e' for column 'a' at row 5")
  result = self._query(query)

奇怪的是,testa b ?aj 2 ?aj 2 špenát 4 kv?ták 7 ku?e 1 ž字符(以及我的完整数据集中的其他字符)被正确处理,因此它似乎只影响unicode字符的子集。正如您在上面所看到的,我也尝试在šá中设置utf-8,但效果不佳。

2 个答案:

答案 0 :(得分:0)

pymysql:

import pymysql
con = pymysql.connect(host='127.0.0.1', port=3306,
                  user='root', passwd='******',
                  charset="utf8mb4")

SQLAlchemy的:

    db_url = sqlalchemy.engine.url.URL(drivername='mysql', host=foo.db_host,
        database=db_schema,
        query={ 'read_default_file' : foo.db_config, 'charset': 'utf8mb4' })

参见"最佳实践"在http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored ?的解释:

  • 要存储的字节不编码为utf8 / utf8mb4。解决这个问题。
  • 数据库中的列是CHARACTER SET utf8(或utf8mb4)。解决这个问题。
  • 另外,检查读取时的连接是否为UTF-8。

(注意:CHARACTER SETs utf8utf8mb4可以互换为欧洲语言。)

这些是捷克人物?

答案 1 :(得分:0)

我遇到了同样的问题,也使用pymysql驱动器。

我将mysql驱动器更改为mysql-connector,1366警告消失了

安装mysql-connector驱动器

pip install mysql-connector

像这样的sqlalchemy引擎设置

create_engine('mysql+mysqlconnector://root:tj1996@localhost:3306/new?charset=utf8mb4')