两个表的SQL连接,在Python

时间:2017-10-27 22:06:34

标签: python sql sqlite

已经有一些问题存在同样的问题,我看了他们所有,但仍未找到解决方案。 我想从两个表合并和计数,具有相同名称的列(device_id)。

列名:events和gender_age_train

import pandas as pd
from sqlalchemy import create_engine # database connection

db_engine = create_engine('sqlite:///devices-train.db')

join_devices = pd.read_sql_query('SELECT device_id, count(device_id), gender_age_train.device_id, count(gender_age_train.device_id) FROM events JOIN gender_age_train on events.device_id = gender_age_train.device_id GROUP BY device_id', db_engine)

print join_devices

python输出是:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) ambiguous column name: device_id [SQL: 'SELECT device_id, count(device_id), gender_age_train.device_id, count(gender_age_train.device_id) FROM events JOIN gender_age_train on events.device_id = gender_age_train.device_id GROUP BY device_id']

2 个答案:

答案 0 :(得分:1)

您需要完全限定device_id列,因为正如您所指出的那样,它出现在两个表格中。值得注意的是,选择events.device_idgender_age_train.device_id因为它们相等(根据join条件)是没有意义的。选择其中一个就足够了:

SELECT   e.device_id, COUNT(*)
FROM     events e
JOIN     gender_age_train g on e.device_id = g.device_id 
GROUP BY e.device_id

答案 1 :(得分:0)

当用于加入的列具有相同名称时,您可以使用USING clause。这不仅可以节省输入,还可以从连接输出中删除重复列,以便您可以更轻松地引用它:

SELECT   device_id, COUNT(*)
FROM     events
JOIN     gender_age_train USING (device_id)
GROUP BY device_id;