在内存中加入SQL表

时间:2017-07-29 17:31:46

标签: python mysql sql in-memory-database

我有3个查询,每个查询都提取一个表(参见下面的脚本)。我想将这些表连接到一个新表中,而不必从数据库中的3个原始查询中保存表(仅在内存中)。这可能吗?

我想这样做有两个原因:

  1. 我无法CREATE TABLE my_table SELECT ..使用connection.commit()等工作来保存服务器上的表格。

  2. 因为这些表格相当大而且我不需要将它们存储在远程数据库中(仅在本地,我使用pickle文件进行每日备份),效率会更高

  3. 代码

    from mysql.connector import connect as sql_connect
    import cPickle as pickle
    
    def extract_values_with_columns(cursor, query, multi=False, verbose=False):
        cursor.execute(query, multi=multi)
        results = list(cursor.fetchall())
        field_names = [i[0] for i in cursor.description]
        if verbose:
            print("Variables: {}".format(field_names), end=" ")
        results.insert(0, field_names)
        return results
    
    def save(dset_name, results):
        with open("{}.pickle".format(dset_name), mode='w') as f:
            f.write(pickle.dumps(results))
    
    if __name__ == '__main__':
        connection = sql_connect(user=SSH_USERNAME, password=DATABASE_PASSWORD,
                                     host='127.0.0.1', port=tunnel.local_bind_port,
                                     database=DATABASE_NAME)      
    
        print("Connection successful!")
        cursor = connection.cursor()                      # get the cursor
        cursor.execute("USE {}".format(DATABASE_NAME))    # select the database
    
        # combine ratings and tweet text
        query = "SELECT rt.tweet_id, rt.rating_id, rt.tweet_text, \
                 {} \
                 FROM contribute_ratedtweet rt \
                 INNER JOIN contribute_rating ra ON rt.rating_id=ra.id".format(emotion_factors)
        results = extract_values_with_columns(cursor, query)
        save('agg_tweets_with_ratings', results)
    
        # combine profiles with demographics and technical data
        # joins should be done on the original variable name, not the renamed one
        demo_vars = "demo.gender, demo.age, demo.ethnicity, demo.education, demo.language, demo.done"
        tech_vars = "tech.entry_point, tech.ip_addr, tech.user_agent, tech.mobile, tech.referrer, tech.time_taken, tech.usage, tech.sharing_consent, tech.time_started"
        query =  "SELECT pro.username, pro.random_seed, \
                 demo.id AS demographic_id, {}, \
                 tech.id AS technical_data_id, {} \
                 FROM contribute_profile pro \
                 INNER JOIN contribute_demographic demo ON pro.demographic_id=demo.id \
                 INNER JOIN contribute_technicaldata tech ON pro.technical_data_id=tech.id".format(demo_vars, tech_vars)
        results = extract_values_with_columns(cursor, query)
        save('agg_profiles_with_info', results)
    
        # add userID and tweet ID for convenience to rated tweets
        query = "SELECT pro_rt.profile_id, pro_rt.ratedtweet_id, pro.username, rt.tweet_id \
                 FROM contribute_profile_rated_tweets pro_rt \
                 INNER JOIN contribute_profile pro ON pro_rt.profile_id=pro.id \
                 INNER JOIN contribute_ratedtweet rt ON pro_rt.ratedtweet_id=rt.id"
        results = extract_values_with_columns(cursor, query)
        save('agg_ratings_with_info', results)
    

1 个答案:

答案 0 :(得分:1)

由于所有三个查询都与qry2 --> qry3 --> qry1关系相关,因此请考虑使用派生表(FROMJOIN子句中的嵌套查询)。下面是一个草图,其中每个查询都被视为自己的表结果集。但是,这可能会根据数据的性质返回重复项。因此,在每个子查询或外部查询中进行重复数据删除。

此外,请确保提供唯一的名称,以便不在外部查询选择列中重复别名,重要的是在 t1 t2 之间正确使用ON子句, t3 加入。因此请相应地填写...,甚至根据需要使用AS重命名。如果预计结果不完全匹配,请使用LEFT JOIN而不是INNER JOIN

SELECT t1.*, t2.*, t3.*
FROM
  (SELECT ...
    FROM contribute_profile pro 
    INNER JOIN contribute_demographic demo 
      ON pro.demographic_id=demo.id 
    INNER JOIN contribute_technicaldata tech 
      ON pro.technical_data_id=tech.id) t1

INNER JOIN
   (SELECT ...
    FROM contribute_profile_rated_tweets pro_rt
    INNER JOIN contribute_profile pro 
       ON pro_rt.profile_id=pro.id
    INNER JOIN contribute_ratedtweet rt 
       ON pro_rt.ratedtweet_id=rt.id) t2
ON t1.profile_id = t2.profile_id

INNER JOIN
    (SELECT ...
      FROM contribute_ratedtweet rt 
      INNER JOIN contribute_rating ra 
         ON rt.rating_id=ra.id) t3
ON t2.tweet_rating_id = t3.tweet_rating_id