具有以下方式的数据框:
word classification counter
0 house noun 2
1 the article 2
2 white adjective 1
3 yellow adjective 1
我想使用以下定义存储在Postgresql表中:
CREATE TABLE public.word_classification (
id SERIAL,
word character varying(100),
classification character varying(10),
counter integer,
start_date date,
end_date date
);
ALTER TABLE public.word_classification OWNER TO postgres;
我当前拥有的基本配置如下:
from sqlalchemy import create_engine
import pandas as pd
# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' ## INSERT YOUR DB ADDRESS IF IT'S NOT ON PANOPLY
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES USERNAME
POSTGRES_PASSWORD = 'BVict31C' ## CHANGE THIS TO YOUR PANOPLY/POSTGRES PASSWORD
POSTGRES_DBNAME = 'local-sandbox-dev' ## CHANGE THIS TO YOUR DATABASE NAME
# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,password=POSTGRES_PASSWORD,ipaddress=POSTGRES_ADDRESS,port=POSTGRES_PORT,dbname=POSTGRES_DBNAME))
# Create the connection
cnx = create_engine(postgres_str)
data=[['the','article',0],['house','noun',1],['yellow','adjective',2],
['the','article',4],['house','noun',5],['white','adjective',6]]
df = pd.DataFrame(data, columns=['word','classification','position'])
df_db = pd.DataFrame(columns=['word','classification','counter','start_date','end_date'])
count_series=df.groupby(['word','classification']).size()
new_df = count_series.to_frame(name = 'counter').reset_index()
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000)
我想插入表中,因为我可以使用SQL语法:
insert into word_classification(word, classification, counter)values('hello','world',1);
当前,由于要传递索引,因此在插入表时出现错误:
(psycopg2.errors.UndefinedColumn) column "index" of relation "word_classification" does not exist
LINE 1: INSERT INTO word_classification (index, word, classification...
^
[SQL: INSERT INTO word_classification (index, word, classification, counter) VALUES (%(index)s, %(word)s, %(classification)s, %(counter)s)]
[parameters: ({'index': 0, 'word': 'house', 'classification': 'noun', 'counter': 2}, {'index': 1, 'word': 'the', 'classification': 'article', 'counter': 2}, {'index': 2, 'word': 'white', 'classification': 'adjective', 'counter': 1}, {'index': 3, 'word': 'yellow', 'classification': 'adjective', 'counter': 1})]
我一直在寻找摆脱运气的方法。
感谢您的帮助
答案 0 :(得分:0)
在数据库中存储时关闭索引,如下所示:
df_db = new_df.to_sql('word_classification',cnx,if_exists='append',chunksize=1000, index=False)