使用SERIAL自动生成的ID将数据框保存在Postgresql数据库中

时间:2019-12-03 00:07:05

标签: python pandas postgresql dataframe sql-insert

具有以下方式的数据框:

     word classification  counter
0   house           noun        2
1     the        article        2
2   white      adjective        1
3  yellow      adjective        1

我想使用以下定义存储在Postgresql表中:

CREATE TABLE public.word_classification (
    id SERIAL,
    word character varying(100),
    classification character varying(10),
    counter integer,
    start_date date,
    end_date date
);
ALTER TABLE public.word_classification OWNER TO postgres;

我当前拥有的基本配置如下:

from sqlalchemy import create_engine
import pandas as pd

# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES USERNAME
POSTGRES_PASSWORD = 'BVict31C' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD 
POSTGRES_DBNAME = 'local-sandbox-dev' ## CHANGE THIS TO YOUR DATABASE NAME
# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,password=POSTGRES_PASSWORD,ipaddress=POSTGRES_ADDRESS,port=POSTGRES_PORT,dbname=POSTGRES_DBNAME))
# Create the connection
cnx = create_engine(postgres_str)

data=[['the','article',0],['house','noun',1],['yellow','adjective',2],
      ['the','article',4],['house','noun',5],['white','adjective',6]]
df = pd.DataFrame(data, columns=['word','classification','position'])
df_db = pd.DataFrame(columns=['word','classification','counter','start_date','end_date'])

count_series=df.groupby(['word','classification']).size()
new_df = count_series.to_frame(name = 'counter').reset_index()
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000)

我想插入表中,因为我可以使用SQL语法:

insert into word_classification(word, classification, counter)values('hello','world',1);

当前,由于要传递索引,因此在插入表时出现错误:

(psycopg2.errors.UndefinedColumn) column "index" of relation "word_classification" does not exist
LINE 1: INSERT INTO word_classification (index, word, classification...
                                         ^

[SQL: INSERT INTO word_classification (index, word, classification, counter) VALUES (%(index)s, %(word)s, %(classification)s, %(counter)s)]
[parameters: ({'index': 0, 'word': 'house', 'classification': 'noun', 'counter': 2}, {'index': 1, 'word': 'the', 'classification': 'article', 'counter': 2}, {'index': 2, 'word': 'white', 'classification': 'adjective', 'counter': 1}, {'index': 3, 'word': 'yellow', 'classification': 'adjective', 'counter': 1})]

我一直在寻找摆脱运气的方法。

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

在数据库中存储时关闭索引,如下所示:

df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000, index=False)