将数据从postgres迁移/复制到vertica

时间:2016-12-01 14:19:59

标签: python postgresql copy database-migration vertica

我正在尝试使用python3将数据从postgres迁移/复制到vertica(如果还有其他用户友好的方式,我很高兴听到它们)。问题是下面的代码只有在我从postgres复制一列数据时才有效。如果我复制多个列,它就不会迁移任何内容。在vertica表中创建的是空的。

如何将整个表格从postgres迁移到vertica?

conn = psycopg2.connect()

input = io.StringIO()
cur_postrgres = conn.cursor()
cur_postrgres.copy_expert('''COPY (SELECT id, date from table_1) TO STDOUT;''', input)
cur_postrgres.close()

cur_vertica.execute("DROP TABLE IF EXISTS table_1_temp;")
cur_vertica.connection.commit()
cur_vertica.execute('''CREATE TABLE table_1_temp (
id BIGINT, date TIMESTAMP WITHOUT TIME ZONE);''')
cur_vertica.connection.commit()

#cur_vertica.stdin = input
#input.seek(0)

cur_vertica.copy('''COPY table_1_temp FROM STDIN NULL AS 'null' ''',  input.getvalue())
cur_vertica.execute("COMMIT;")
cur_vertica.close()

2 个答案:

答案 0 :(得分:1)

Another way to copy a Postgres database to Vertica is to use pg_dump. This creates a tar with tab-separated text datafiles and a SQL program that you can edit and execute in Vertica.

This can be useful if there are many tables that need to be created. The SQL contains statements like CREATE TABLE, ADD INDEX, CREATE SEQUENCE, etc. for each table, and generates COPY statements to load each data file.

Vertica is based on PostgresQL so the dialects are similar. The restore.sql that it generates is almost perfect, you just need to delete statements that are not relevant, maybe change the schema name, and refine the COPY statements.

pg_dump --format=tar --dbname=mydb --username=myuser --no-owner --verbose --no-privileges > mydata.tar

Optionally compress the tar before export

zip mydata.tar.zip mydata.tar

Copy the tarball to a working directory on the Vertica machine

scp -i ~/.ssh/secret.pem mydata.tar.zip  mydata.tar.zip  myuser@123.456.345:/data

Log into the instance, unzip the tarball:

 ssh -i ~/.ssh/secret.pem myuser@123.456.345:/data
 unzip mydata.tar.zip
 tar -xvf mydata.tar

Now edit the restore.sql file appropriately. I found I needed to:

  • delete a bunch of stuff at the top like SET statement_timeout = 0; and COMMENT ON EXTENSION plpgsql that isn't relevant to Vertica

  • delete one of the two COPY statements it generates, one from STDIN and one from a file

  • edit the COPY statement to add vertica-specific things like DELIMITER AS E'\t' NULL AS '\N' ABORT ON ERROR;

After that, importing was just executing that file in Vertica:

\i restore_modified.sql

答案 1 :(得分:0)

我相信postgresql副本中的默认分隔符是一个选项卡。 vertica的默认分隔符是管道。您可能需要在vertica副本上指定public V call() throws Exception { System.out.println("Call of MyCallable invoked"); System.out.println("Result = " + this.ci.doSomething(10, 20)); //... javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { updateView(); } }); //... return (V) "Good job"; } ,或在postgresql上指定DELIMITER E'\t'

由于没有足够的列,数据很可能会被拒绝。