如何用pandas和sqlalchemy进行chunkwise读写

时间:2016-12-06 09:54:52

标签: python pandas sqlalchemy

我使用python(版本3.4.4),pandas(版本0.19.1)和sqlalchemy(版本1.1.4)以便从大型SQL表中进行chunkwise读取,预处理这些块并将其写入一个不同的SQL表。 使用pd.read_sql_query(verses_sql, conn, chunksize=10)连续chunkwise读取,其中pd是pandas导入,verses_sql是SQL查询,conn是DB-API连接,如果我这样做,则工作正常:< / p>

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
conn = engine.connect()

verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]'''

for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10):
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ")
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ")
    chunk['KA_Lang'] = chunk['KA_Lang'].str.lower()
    print(chunk['KA_Lang'].head(1))

问题在于:如果我尝试在第二个SQL表中编写预处理的块chunk['KA_Lang'],请将其称为SecondTable,仅将称为传递了10个元素的大块。迭代在那里停止。以下是改编的代码:

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
conn = engine.connect()

verses_sql = '''SELECT [KA_Lang] FROM [dbo].[<FirstTable>]'''

for chunk in pd.read_sql_query(verses_sql, conn, chunksize=10):
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'[^a-zA-Z\u00C0-\u02AF]'," ")
    chunk['KA_Lang'] = chunk['KA_Lang'].str.replace(r'\s\s+', " ")
    chunk['KA_Lang'] = chunk['KA_Lang'].str.lower()
    print(chunk['KA_Lang'].head(1))

    chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False)

conn.close()

如何从一个SQL表中连续读取一个块并将其写入另一个SQL表?如果我包括:chunk.to_sql('<SecondTable>', conn, if_exists= 'append', index= False)

,为什么所有块的迭代都会停止

1 个答案:

答案 0 :(得分:1)

经过几天尝试不同的解决方法后,我解决了这个问题。这很容易。为了从一个SQL表中连续读取一个块并将其写入另一个SQL表,需要定义两个不同的连接:

engine = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
engine1 = create_engine('mssql+pymssql://<username>:<password>@<database>:1433/<FirstTable>')
conn = engine.connect()
conn1 = engine1.connect()

代码行,其中chunk写在第二个表中,需要适应:

chunk.to_sql('<SecondTable>', conn1, if_exists= 'append', index= False)

完成!