Question

我正在使用df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql')将数据框导出到mysql中。但是，我发现具有长字符串内容的列（例如url）被截断为63位数。我在导出时从ipython笔记本收到以下警告：

/usr/local/lib/python2.7/site-packages/pandas/io/sql.py:248：警告：第3行的“url”列数据被截断 cur.executemany（insert_query，data）

对于不同的行，在同一样式中还有其他警告。

我可以通过调整来正确导出完整数据吗？我可以在mysql中设置正确的数据模式，然后导出到该模式。但是我希望调整可以让它直接从python中运行。

Answer 1

如果您使用pandas 0.13.1或更早版，则这个63位数的限制确实是硬编码的，因为代码中的这一行：https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py#L278

作为一种解决方法，你可以使用函数get_sqltype monkeypatch：

from pandas.io import sql

def get_sqltype(pytype, flavor):
    sqltype = {'mysql': 'VARCHAR (63)',    # <-- change this value to something sufficient higher
               'sqlite': 'TEXT'}

    if issubclass(pytype, np.floating):
        sqltype['mysql'] = 'FLOAT'
        sqltype['sqlite'] = 'REAL'
    if issubclass(pytype, np.integer):
        sqltype['mysql'] = 'BIGINT'
        sqltype['sqlite'] = 'INTEGER'
    if issubclass(pytype, np.datetime64) or pytype is datetime:
        sqltype['mysql'] = 'DATETIME'
        sqltype['sqlite'] = 'TIMESTAMP'
    if pytype is datetime.date:
        sqltype['mysql'] = 'DATE'
        sqltype['sqlite'] = 'TIMESTAMP'
    if issubclass(pytype, np.bool_):
        sqltype['sqlite'] = 'INTEGER'

    return sqltype[flavor]

sql.get_sqltype = get_sqltype

然后只使用你的代码就可以了：

df.to_sql(con=con_mysql, name='testdata', if_exists='replace', flavor='mysql')

从pandas 0.14 开始，sql模块使用sqlalchemy，字符串转换为sqlalchemy TEXT类型，转换为mysql TEXT类型（而不是VARCHAR），这也允许您存储大于63位的字符串：

engine = sqlalchemy.create_engine('mysql://scott:tiger@localhost/foo')
df.to_sql('testdata', engine, if_exists='replace')

仅当您仍然使用DBAPI连接而不是sqlalchemy引擎时，问题仍然存在，但不推荐使用此选项，建议将sqlalchemy引擎提供给to_sql。

Answer 2

受@ joris的回答启发，我决定将更改硬编码到熊猫的源代码并重新编译。

cd /usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io
sudo pico sql.py

更改了行871

'mysql': 'VARCHAR (63)',

到

'mysql': 'VARCHAR (255)',

然后重新编译该文件

sudo python -m py_compile sql.py

重新启动我的脚本，_to_sql()函数写了一个表。（我预计重组会打破熊猫，但似乎没有。）

这是我的脚本，将数据帧写入mysql，供参考。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlalchemy 
from sqlalchemy import create_engine
df = pd.read_csv('10k.csv')
## ... dataframe munging
df = df.where(pd.notnull(df), None) # workaround for NaN bug
engine = create_engine('mysql://user:password@localhost:3306/dbname')
con = engine.connect().connection
df.to_sql("issues", con, 'mysql', if_exists='replace', index=True, index_label=None)

pandas to_sql截断我的数据

2 个答案: