我requests.get()
得到一些json。之后,我想将数据插入postgresql。发生了一些非常有趣的事情,如果我使用df.to_sql(index=False)
,数据会被附加到postgresql中而没有问题,但postgresql中的Id不会创建自动增量值;该列完全是空的。如果我删除df.to_sql()
中的参数,则会收到以下错误... IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint
。这是我的代码......
import requests
import pandas as pd
import sqlalchemy
urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']
df_list = []
for url in urls:
data = requests.get(url)
data_json = data.json()
df = pd.DataFrame(data_json['query']['results']['quote'])
df_list.append(df)
quote_df = pd.concat(df_list)
engine = sqlalchemy.create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')
我想使用postgresql自动增量索引将df
插入到postgresql中。
如何修复我的代码。
我添加以下代码来修复数据框中的索引...
quote_df = pd.concat(df_list)
quote_df.index.name = 'Index'
quote_df = quote_df.reset_index()
quote_df['Index'] = quote_df.index
engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes',engine,if_exists ='append',index = False) engine.dispose()
现在我在追加到postgresql时遇到以下错误...
ProgrammingError: (psycopg2.ProgrammingError) column "Index" of relation "quotes" does not exist LINE 1: INSERT INTO quotes ("Index", "Adj_Close", "Close", "Date", "...
该列确实存在于数据库中。
答案 0 :(得分:0)
执行此操作的一种方式(多种方式)是:
获取最大Id
并将其存储到变量(让我们称之为max_id
):
select max(Id) from quotes;
现在我们可以这样做:
原创DF:
In [55]: quote_df
Out[55]:
Adj_Close Close Date High Low Open Symbol Volume
0 170.572764 173.990005 2015-12-31 175.649994 173.970001 175.089996 DIA 5773400
1 172.347213 175.800003 2015-12-30 176.720001 175.619995 176.570007 DIA 2910000
2 173.50403 176.979996 2015-12-29 177.25 176.00 176.190002 DIA 6145700
.. ... ... ... ... ... ... ... ...
213 88.252244 89.480003 2016-01-06 90.099998 89.080002 89.279999 IWN 1570400
214 89.297697 90.540001 2016-01-05 90.620003 89.75 90.410004 IWN 2053100
215 88.893319 90.129997 2016-01-04 90.730003 89.360001 90.550003 IWN 2540600
[1404 rows x 8 columns]
现在我们可以按max_id
增加索引:
In [56]: max_id = 123456 # <-- you don't need this line...
In [57]: quote_df.index += max_id
并将索引设置为Id
列:
In [58]: quote_df.reset_index().rename(columns={'index':'Id'})
Out[58]:
Id Adj_Close Close Date High Low Open Symbol Volume
0 123456 170.572764 173.990005 2015-12-31 175.649994 173.970001 175.089996 DIA 5773400
1 123457 172.347213 175.800003 2015-12-30 176.720001 175.619995 176.570007 DIA 2910000
2 123458 173.50403 176.979996 2015-12-29 177.25 176.00 176.190002 DIA 6145700
... ... ... ... ... ... ... ... ... ...
1401 123669 88.252244 89.480003 2016-01-06 90.099998 89.080002 89.279999 IWN 1570400
1402 123670 89.297697 90.540001 2016-01-05 90.620003 89.75 90.410004 IWN 2053100
1403 123671 88.893319 90.129997 2016-01-04 90.730003 89.360001 90.550003 IWN 2540600
[1404 rows x 9 columns]
现在应该可以将此DF写入PostgreSQL指定(index=False
)
答案 1 :(得分:0)
我发现在df.reset_index()
之后,我可以删除额外的列pandas create并且原始索引列保持重置状态。现在如果我执行没有index=False
的代码,sqlalchemy会将索引插入postgres。这是解决我问题的代码......
import requests
import pandas as pd
from sqlalchemy import create_engine
urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']
df_list = []
for url in urls:
data = requests.get(url)
data_json = data.json()
df = pd.DataFrame(data_json['query']['results']['quote'])
df_list.append(df)
quote_df = pd.concat(df_list)
quote_df = quote_df.reset_index()
quote_df = quote_df.drop('index', 1)
engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')