我有3个查询,每个查询都提取一个表(参见下面的脚本)。我想将这些表连接到一个新表中,而不必从数据库中的3个原始查询中保存表(仅在内存中)。这可能吗?
我想这样做有两个原因:
我无法CREATE TABLE my_table SELECT ..
使用connection.commit()
等工作来保存服务器上的表格。
因为这些表格相当大而且我不需要将它们存储在远程数据库中(仅在本地,我使用pickle文件进行每日备份),效率会更高
代码
from mysql.connector import connect as sql_connect
import cPickle as pickle
def extract_values_with_columns(cursor, query, multi=False, verbose=False):
cursor.execute(query, multi=multi)
results = list(cursor.fetchall())
field_names = [i[0] for i in cursor.description]
if verbose:
print("Variables: {}".format(field_names), end=" ")
results.insert(0, field_names)
return results
def save(dset_name, results):
with open("{}.pickle".format(dset_name), mode='w') as f:
f.write(pickle.dumps(results))
if __name__ == '__main__':
connection = sql_connect(user=SSH_USERNAME, password=DATABASE_PASSWORD,
host='127.0.0.1', port=tunnel.local_bind_port,
database=DATABASE_NAME)
print("Connection successful!")
cursor = connection.cursor() # get the cursor
cursor.execute("USE {}".format(DATABASE_NAME)) # select the database
# combine ratings and tweet text
query = "SELECT rt.tweet_id, rt.rating_id, rt.tweet_text, \
{} \
FROM contribute_ratedtweet rt \
INNER JOIN contribute_rating ra ON rt.rating_id=ra.id".format(emotion_factors)
results = extract_values_with_columns(cursor, query)
save('agg_tweets_with_ratings', results)
# combine profiles with demographics and technical data
# joins should be done on the original variable name, not the renamed one
demo_vars = "demo.gender, demo.age, demo.ethnicity, demo.education, demo.language, demo.done"
tech_vars = "tech.entry_point, tech.ip_addr, tech.user_agent, tech.mobile, tech.referrer, tech.time_taken, tech.usage, tech.sharing_consent, tech.time_started"
query = "SELECT pro.username, pro.random_seed, \
demo.id AS demographic_id, {}, \
tech.id AS technical_data_id, {} \
FROM contribute_profile pro \
INNER JOIN contribute_demographic demo ON pro.demographic_id=demo.id \
INNER JOIN contribute_technicaldata tech ON pro.technical_data_id=tech.id".format(demo_vars, tech_vars)
results = extract_values_with_columns(cursor, query)
save('agg_profiles_with_info', results)
# add userID and tweet ID for convenience to rated tweets
query = "SELECT pro_rt.profile_id, pro_rt.ratedtweet_id, pro.username, rt.tweet_id \
FROM contribute_profile_rated_tweets pro_rt \
INNER JOIN contribute_profile pro ON pro_rt.profile_id=pro.id \
INNER JOIN contribute_ratedtweet rt ON pro_rt.ratedtweet_id=rt.id"
results = extract_values_with_columns(cursor, query)
save('agg_ratings_with_info', results)
答案 0 :(得分:1)
由于所有三个查询都与qry2 --> qry3 --> qry1
关系相关,因此请考虑使用派生表(FROM
或JOIN
子句中的嵌套查询)。下面是一个草图,其中每个查询都被视为自己的表结果集。但是,这可能会根据数据的性质返回重复项。因此,在每个子查询或外部查询中进行重复数据删除。
此外,请确保提供唯一的名称,以便不在外部查询选择列中重复别名,重要的是在 t1 , t2 之间正确使用ON
子句, t3 加入。因此请相应地填写...
,甚至根据需要使用AS
重命名。如果预计结果不完全匹配,请使用LEFT JOIN
而不是INNER JOIN
。
SELECT t1.*, t2.*, t3.*
FROM
(SELECT ...
FROM contribute_profile pro
INNER JOIN contribute_demographic demo
ON pro.demographic_id=demo.id
INNER JOIN contribute_technicaldata tech
ON pro.technical_data_id=tech.id) t1
INNER JOIN
(SELECT ...
FROM contribute_profile_rated_tweets pro_rt
INNER JOIN contribute_profile pro
ON pro_rt.profile_id=pro.id
INNER JOIN contribute_ratedtweet rt
ON pro_rt.ratedtweet_id=rt.id) t2
ON t1.profile_id = t2.profile_id
INNER JOIN
(SELECT ...
FROM contribute_ratedtweet rt
INNER JOIN contribute_rating ra
ON rt.rating_id=ra.id) t3
ON t2.tweet_rating_id = t3.tweet_rating_id