Question

我正在使用几百列的CSV，其中许多只是枚举，即：

[
['code_1', 'code_2', 'code_3', ..., 'code_50'],
[1, 2, 3, ..., 50],
[2, 3, 4, ..., 51],
...
[400000, 400001, 400002, ..., 400049]
]

我将这些数据导入PostgreSQL，并希望将这些列连接成一个数组，例如：

[
['codes'],
['{1, 2, 3, ..., 50}']
]

等等。

我知道我可以实现的“四舍五入”方式，例如

df['codes'] = pd.DataFrame(["{" + df['code_1'] + ", " + df['code_2'] + "}"]).T

但考虑到此CSV的大小，编写和维护的冗余代码很多。

我基本上必须使用的是列列表，我已经提取了枚举列，例如：

codes = [
    'code_1',
    'code_2',
    'code_3',
    ...
]

在我开始编写我自己的自定义“implode_columns(arr)”函数之前，pandas中是否有任何东西已经解决了这个问题，或者是否有方便的方式来容纳PostgreSQL数组？

Answer 1

假设您已经连接到PostgreSQL并且已经在PostgreSQL中拥有该表。或者访问此链接https://wiki.postgresql.org/wiki/Psycopg2_Tutorial

import psycopg2

try:
    conn = psycopg2.connect("host='localhost' dbname='template1' user='dbuser' password='dbpass'")
except:
    print "I am unable to connect to the database"

首先，打开.csv文件。

>>> import csv
>>> with open('names.csv') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...

来自https://docs.python.org/2/library/csv.html的例子使用insert插入PostgreSQL更改打印行。

>>> import psycopg2    
>>> cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)",
    ...      (100, "abc'def"))

您可以使用（variable1，variable2）更改（100，＆＃34; abc＆＃39; def＆＃34;）查看此链接http://initd.org/psycopg/docs/usage.html 或者在完整的示例代码中：

>>> import csv
>>> import psycopg2
>>> with open('names.csv') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (variable1, variable2))
...

希望这会有所帮助......

将pandas列转换为PostgreSQL列表？

1 个答案: