使用request和psycopg2在Postgres中创建/插入Json

时间:2017-12-11 06:53:49

标签: python json postgresql

刚开始使用PostgreSQL的项目。我想从Excel跳到数据库,我坚持创建和插入。一旦我运行它,我将不得不将其切换到更新我相信所以我不会继续写入当前数据。我知道我的连接正常,但我收到以下错误。

我的错误是:TypeError: not all arguments converted during string formatting

#!/usr/bin/env python
import requests
import psycopg2

conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 
data = req.json()['data']

my_data = []
for item in data:
    season = item['seasonId']
    player = item['playerName']
    first_name = item['playerFirstName']
    last_Name = item['playerLastName']
    playerId = item['playerId']
    height = item['playerHeight']
    pos = item['playerPositionCode']
    handed = item['playerShootsCatches']
    city = item['playerBirthCity']
    country = item['playerBirthCountry']   
    state = item['playerBirthStateProvince']
    dob = item['playerBirthDate']
    draft_year = item['playerDraftYear']
    draft_round = item['playerDraftRoundNo']
    draft_overall = item['playerDraftOverallPickNo']
    my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])

cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))

data:

的示例
[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]

1 个答案:

答案 0 :(得分:3)

您似乎想要创建一个名为"data"的列的表。此列的类型是JSON。 (我建议每个字段创建一列,但这取决于你。)

在这种情况下,变量data(从请求中读取)是list的{​​{1}}。正如我在评论中提到的,您可以循环dict并一次执行一次插入,因为data并不比多次调用executemany()快。

我做的是以下内容:

  1. 创建您关注的字段列表。
  2. 循环遍历execute()
  3. 的元素
  4. 对于data中的每个item,将字段提取到data
  5. 致电my_data并传入execute()(将json.dumps(my_data)my_data转换为JSON字符串)
  6. 试试这个:

    dict

    我不是100%确定这里的所有postgres语法是否正确(我无法访问PG数据库进行测试),但我相信这个逻辑应该适用于你想要做的事情。 / p>

    更新单独的列

    您可以修改create语句以处理多个列,但需要知道每列的数据类型。这是你可以遵循的一些伪代码:

    #!/usr/bin/env python
    import requests
    import psycopg2
    import json
    
    conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
    
    req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 
    
    # data here is a list of dicts
    data = req.json()['data']
    
    cur = conn.cursor()
    # create a table with one column of type JSON
    cur.execute("CREATE TABLE t_skaters (data json);")
    
    fields = [
        'seasonId',
        'playerName',
        'playerFirstName',
        'playerLastName',
        'playerId',
        'playerHeight',
        'playerPositionCode',
        'playerShootsCatches',
        'playerBirthCity',
        'playerBirthCountry',
        'playerBirthStateProvince',
        'playerBirthDate',
        'playerDraftYear',
        'playerDraftRoundNo',
        'playerDraftOverallPickNo'
    ]
    
    for item in data:
        my_data = {field: item[field] for field in fields}
        cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))
    
    
    # commit changes
    conn.commit()
    # Close the connection
    conn.close()
    

    # same boilerplate code from above cur = conn.cursor() # create a table with one column per field cur.execute( """CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);""" ) fields = [ 'seasonId', 'playerName', 'playerFirstName', 'playerLastName', 'playerId', 'playerHeight', 'playerPositionCode', 'playerShootsCatches', 'playerBirthCity', 'playerBirthCountry', 'playerBirthStateProvince', 'playerBirthDate', 'playerDraftYear', 'playerDraftRoundNo', 'playerDraftOverallPickNo' ] for item in data: my_data = [item[field] for field in fields] # need a placeholder (%s) for each variable # refer to postgres docs on INSERT statement on how to specify order cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data)) # commit changes conn.commit() # Close the connection conn.close() 替换为适当的数据值。