我试图使用Psycopg2将一些数据传入postgreSQL数据库。我用来加载数据库的功能如下:
def load_db():
data = clean_data()
conn = psycopg2.connect(database='database', user='user')
cur = conn.cursor()
for d in data:
publisher_id = (d[5]['publisher_id'])
publisher = (d[4]['publisher'])
cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
(publisher_id, publisher))
conn.commit()
cur.close()
conn.close()
但是我收到错误IndexError: tuple index out of range
并且不确定我做错了什么。在我尝试输入的记录中,有很多重复的publisher_id
和publisher
,因此WHERE NOT EXISTS
。我是通过python使用数据库的新手,所以我确信它有点简单。提前谢谢!
UPDATE!
data
的样本如下:
[{'article_id': 7676933011},
{'web_id': u'world/2015/jul/03/iranian-foreign-minister-raises-prospect-of-joint-action-against-islamic-state'},
{'title': u'Iranian foreign minister raises prospect of joint action against Islamic State'},
{'pub_date': u'2015-07-03T21:30:51Z'},
{'publisher': 'The Guardian'},
{'publisher_id': '1'},
{'author': u'Julian Borger'},
{'author_id': u'15924'},
{'city_info': [{'city_name': u'Vienna',
'country_code': u'US',
'id': 4791160,
'lat': 38.90122,
'lon': -77.26526}]},
{'country_info': [{'country_code': u'IR',
'country_name': u'Islamic Republic of Iran',
'lat': 32.0,
'lon': 53.0},
{'country_code': u'US',
'country_name': u'United States',
'lat': 39.76,
'lon': -98.5}]},
{'org_info': [{'organization': u'Republican'},
{'organization': u'US Congress'},
{'organization': u'Palais Coburg Hotel'},
{'organization': u'Islamic State'},
{'organization': u'United'}]},
{'people_info': [{'people': u'Mohammad Javad Zarif'},
{'people': u'John Kerry'}]}]
完整的追溯是:
Traceback (most recent call last):
File "/Users/Desktop/process_text/LoadDB.py", line 69, in <module>
load_db()
File "/Users/Desktop/process_text/LoadDB.py", line 50, in load_db
(publisher_id, publisher))
IndexError: tuple index out of range
答案 0 :(得分:4)
问题出在您的cur.execute()
行 -
cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
(publisher_id, publisher))
正如您在上面所看到的,您使用了三个%s
- ...SELECT (%s,%s)...WHERE id = %s);
,但您只提供两个值(元组中的两个值)。
当cur.execute
内部尝试查找第三个值时,会导致索引问题。
我不确定哪些值是正确的,但您需要将其更改为2 %s
,或者在元组中提供第三个值 - (publisher_id, publisher)
。