Question

这是CS50 Web Project1阶段，我必须在其中导入5000行的books.csv（包含isbn，标题，作者和年份）。问题是导入本身花费的时间太长（每秒约10行），我认为这是不正常的。如何加快速度？

我用varchar创建了一个包含isbn，title，author和year行的表。我使用postgesql。接下来，我写了import.py，看起来像这样

import csv
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker


engine = create_engine(os.getenv("DATABASE_URL"))
db = scoped_session(sessionmaker(bind=engine))

def main():
    f = open("books.csv")
    reader = csv.reader(f)
    for ISBN, title, author, year in reader:
        db.execute("INSERT INTO books (ISBN, title, author, year)      VALUES (:ISBN, :title, :author, :year)",
       {"ISBN":ISBN, "title":title, "author":author, "year":year})
    db.commit()

if __name__ == "__main__":
    main()

我希望导入会在不到一分钟的时间内进行，但是现在大约需要30-40分钟。

Answer 1

鉴于您正在观察的性能（每秒10行），我想对数据库的请求的等待时间会很高（请使用ping进行检查）。在这种情况下，使用INSERT .. VALUES (..), (..), (..)在单个查询中插入多行会很有帮助。

为此，您必须：

将要插入的值列表传递给execute：db.execute(sql_query, list_of_dicts_here)
假设您使用psycopg2连接到Postgres，则需要通过将executemany_mode='values'传递到create_engine来tell sqlalchemy to use psycopg2's "fast execution helpers"

如果您的文件较大，建议您使用"COPY FROM"批量加载，但是仅进行5000行就没有意义了。

如何加快将CSV导入SQL？

1 个答案: