Question

我找到了一种非常不错的方法：

从sql数据库读取表
用字典重命名列（从yaml文件读取）
将表重写到另一个数据库

唯一的问题是，随着表变大（10col x几百万行），将表读入熊猫会占用大量内存，从而导致进程被杀死。

必须有一种更简单的方法。我查看了alter table语句，但它们似乎也很复杂，不会在另一个数据库中进行复制。关于如何在不使用大量内存的情况下执行相同操作的任何想法。由于SQL错误，我喜欢使用熊猫作为拐杖。

import pandas as pd
import sqlite3
def translate2generic(sourcedb, targetdb, sourcetable,
                      targettable, toberenamed):
    """Change table's column names to fit generic api keys.

    :param: Path to source db
    :param: Path to target db
    :param: Name of table to be translated in source
    :param: Name of the newly to be created table in targetdb
    :param: dictionary of translations
    :return: New column names in target db
    """
    sourceconn = sqlite3.connect(sourcedb)
    targetconn = sqlite3.connect(targetdb)
    table = pd.read_sql_query('select * from ' + sourcetable, sourceconn) #this is the line causing the crash

    # read dict in the format {"oldcol1name": "newcol1name", "oldcol2name": "newcol2name"}
    rename = {v: k for k, v in toberenamed.items()} 


    # rename columns
    generic_table = table.rename(columns=rename)

    # Write table to new database
    generic_table.to_sql(targettable, targetconn, if_exists="replace")
    targetconn.close()
    sourceconn.close()

我也看过诸如this one之类的解决方案，但它们假定您知道列的类型。

非常感谢一个优雅的解决方案。

编辑：我知道从9月发布的3.25.0开始，sqlite中就有一种方法，但是我仍然使用版本2.6.0

Answer 1

要详细说明我的评论...

如果foo.db中有一个表，并且想将该表的数据复制到bar.db中具有不同列名的新表中，则：

$ sqlite3 foo.db
sqlite> ATTACH 'bar.db' AS bar;
sqlite> CREATE TABLE bar.newtable(newcolumn1, newcolumn2);
sqlite> INSERT INTO bar.newtable SELECT oldcolumn1, oldcolumn2 FROM main.oldtable;

在sqlite / pandas中以更少的内存密集方式复制表和重命名列

1 个答案: