使用Pandas to_sql()丢失的数据插入Clickhouse数据库

时间:2018-12-26 02:29:15

标签: python-3.x sqlalchemy clickhouse

这是我第一次使用sqlalchemy和pandas将一些数据插入Clickhouse数据库。

当我尝试使用clickhouse cli插入一些数据时,它可以正常工作,但是当我尝试使用sqlalchemy执行相同的操作时,我不知道为什么缺少一行。

我做错了什么吗?

import pandas as pd
# created the dataframe

engine = create_engine(uri)
session = make_session(engine)
metadata = MetaData(bind=engine)
metadata.reflect(bind = engine)
conn = engine.connect()
df.to_sql('test', conn, if_exists = 'append', index = False)

1 个答案:

答案 0 :(得分:0)

让我们尝试一下:

import pandas as pd
from infi.clickhouse_orm.engines import Memory
from infi.clickhouse_orm.fields import UInt16Field, StringField
from infi.clickhouse_orm.models import Model
from sqlalchemy import create_engine


# define the ClickHouse table schema
class Test_Humans(Model):
    year = UInt16Field()
    first_name = StringField()
    engine = Memory()


engine = create_engine('clickhouse://default:@localhost/test')

# create table
with engine.connect() as conn:
    conn.connection.create_table(Test_Humans) # https://github.com/Infinidat/infi.clickhouse_orm/blob/master/src/infi/clickhouse_orm/database.py#L142

pdf = pd.DataFrame.from_records([
    {'year': 1994, 'first_name': 'Vova'},
    {'year': 1995, 'first_name': 'Anja'},
    {'year': 1996, 'first_name': 'Vasja'},
    {'year': 1997, 'first_name': 'Petja'},
    # ! sqlalchemy-clickhouse ignores the last item so add fake one
    {}
])

pdf.to_sql('test_humans', engine, if_exists='append', index=False)

请考虑到 sqlalchemy-clickhouse 会忽略最后一项,因此添加伪造的项(请参阅source code和相关的issue 10)。