尝试插入Pyscopg2不存在的位置时出现索引错误

时间:2015-07-04 13:11:38

标签: python postgresql psycopg2

我试图使用Psycopg2将一些数据传入postgreSQL数据库。我用来加载数据库的功能如下:

def load_db():
    data = clean_data()

    conn = psycopg2.connect(database='database', user='user')
    cur = conn.cursor()

    for d in data:
        publisher_id = (d[5]['publisher_id'])
        publisher = (d[4]['publisher'])

        cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
           (publisher_id, publisher))

    conn.commit()
    cur.close()
    conn.close()

但是我收到错误IndexError: tuple index out of range并且不确定我做错了什么。在我尝试输入的记录中,有很多重复的publisher_idpublisher,因此WHERE NOT EXISTS。我是通过python使用数据库的新手,所以我确信它有点简单。提前谢谢!

UPDATE!

data的样本如下:

 [{'article_id': 7676933011},
  {'web_id': u'world/2015/jul/03/iranian-foreign-minister-raises-prospect-of-joint-action-against-islamic-state'},
  {'title': u'Iranian foreign minister raises prospect of joint action against Islamic State'},
  {'pub_date': u'2015-07-03T21:30:51Z'},
  {'publisher': 'The Guardian'},
  {'publisher_id': '1'},
  {'author': u'Julian Borger'},
  {'author_id': u'15924'},
  {'city_info': [{'city_name': u'Vienna',
                  'country_code': u'US',
                  'id': 4791160,
                  'lat': 38.90122,
                  'lon': -77.26526}]},
  {'country_info': [{'country_code': u'IR',
                     'country_name': u'Islamic Republic of Iran',
                     'lat': 32.0,
                     'lon': 53.0},
                    {'country_code': u'US',
                     'country_name': u'United States',
                     'lat': 39.76,
                     'lon': -98.5}]},
  {'org_info': [{'organization': u'Republican'},
                {'organization': u'US Congress'},
                {'organization': u'Palais Coburg Hotel'},
                {'organization': u'Islamic State'},
                {'organization': u'United'}]},
  {'people_info': [{'people': u'Mohammad Javad Zarif'},
                   {'people': u'John Kerry'}]}]

完整的追溯是:

Traceback (most recent call last):
  File "/Users/Desktop/process_text/LoadDB.py", line 69, in <module>
    load_db()
  File "/Users/Desktop/process_text/LoadDB.py", line 50, in load_db
    (publisher_id, publisher))
IndexError: tuple index out of range

1 个答案:

答案 0 :(得分:4)

问题出在您的cur.execute()行 -

cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
       (publisher_id, publisher))

正如您在上面所看到的,您使用了三个%s - ...SELECT (%s,%s)...WHERE id = %s);,但您只提供两个值(元组中的两个值)。

cur.execute内部尝试查找第三个值时,会导致索引问题。

我不确定哪些值是正确的,但您需要将其更改为2 %s,或者在元组中提供第三个值 - (publisher_id, publisher)