Question

我requests.get()得到一些json。之后，我想将数据插入postgresql。发生了一些非常有趣的事情，如果我使用df.to_sql(index=False)，数据会被附加到postgresql中而没有问题，但postgresql中的Id不会创建自动增量值;该列完全是空的。如果我删除df.to_sql()中的参数，则会收到以下错误... IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint。这是我的代码......

import requests
import pandas as pd
import sqlalchemy

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']
df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)


quote_df = pd.concat(df_list)
engine = sqlalchemy.create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

我想使用postgresql自动增量索引将df插入到postgresql中。如何修复我的代码。

问题更新10NOV2016 1900

我添加以下代码来修复数据框中的索引...

quote_df = pd.concat(df_list)
quote_df.index.name = 'Index'
quote_df = quote_df.reset_index()
quote_df['Index'] = quote_df.index

engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')

quote_df.to_sql（'quotes'，engine，if_exists ='append'，index = False） engine.dispose（）

现在我在追加到postgresql时遇到以下错误...

ProgrammingError: (psycopg2.ProgrammingError) column "Index" of relation "quotes" does not exist LINE 1: INSERT INTO quotes ("Index", "Adj_Close", "Close", "Date", "...

该列确实存在于数据库中。

Answer 1

执行此操作的一种方式（多种方式）是：

获取最大Id并将其存储到变量（让我们称之为max_id）：

select max(Id) from quotes;

现在我们可以这样做：

原创DF：

In [55]: quote_df
Out[55]:
      Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0    170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1    172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
..          ...         ...         ...         ...         ...         ...    ...      ...
213   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
214   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
215   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 8 columns]

现在我们可以按max_id增加索引：

In [56]: max_id = 123456    # <-- you don't need this line... 

In [57]: quote_df.index += max_id

并将索引设置为Id列：

In [58]: quote_df.reset_index().rename(columns={'index':'Id'})
Out[58]:
          Id   Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0     123456  170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1     123457  172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     123458   173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
...      ...         ...         ...         ...         ...         ...         ...    ...      ...
1401  123669   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
1402  123670   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
1403  123671   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 9 columns]

现在应该可以将此DF写入PostgreSQL指定（index=False）

Answer 2

回答我自己的问题11NOV2016 1112

我发现在df.reset_index()之后，我可以删除额外的列pandas create并且原始索引列保持重置状态。现在如果我执行没有index=False的代码，sqlalchemy会将索引插入postgres。这是解决我问题的代码......

import requests
import pandas as pd
from sqlalchemy import create_engine

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']

df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)

quote_df = pd.concat(df_list)
quote_df = quote_df.reset_index()
quote_df = quote_df.drop('index', 1)


engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

使用idx autoincrement

问题更新10NOV2016 1900

2 个答案:

回答我自己的问题11NOV2016 1112