python sqlite的第一个查询在一系列类似的查询中速度较慢

时间:2018-11-14 20:19:30

标签: python database sqlite

我有一个sqlite数据库,可以通过python和sqlalchemy访问。

我有这个功能:

def _try_db_reads(runner, cfg):                                                                                                                                           
    # Get last time seen.                                                                                                                                                 
    max_so_far = 0                                                                                                                                                        

    BM = BinManager(cfg.bin_dir,                                                                                                                                          
                    None,                                                                                                                                                 
                    history=cfg.max_days,                                                                                                                                 
                    max_bins=cfg.max_bins,                                                                                                                                
                    min_members=cfg.min_bin_members,                                                                                                                      
                    backsteps=cfg.hillclimb_backsteps)                                                                                                                    
    BM.load_bins()                                                                                                                                                        

    for i, (bin_id, users) in enumerate(BM.iterbins()):                                                                                                                   
        t0 = time.time()                                                                                                                                                  
        df = runner.DBM.get_df_daily_data_per_users(users)                                                                                                                
        t1 = time.time()                                                                                                                                                  
        if t1 - t0 > max_so_far:                                                                                                                                          
            max_so_far = t1-t0                                                                                                                                            
        logger.info('for bin '+ str(bin_id) + ' of n_users=' + str(len(users)) + ' read df of shape ' + str(df.shape) + ' in ' + str(t1-t0) + ' max so far ' + str(max_so\
_far))          

在这里,for循环在大约100个用户的组上进行迭代,并且每次都在获取数据。这些组是随机生成的,每个用户在db上的数据量应该大致相同。

以下是用于从db读取的函数:

def get_df_daily_data_per_users(self, users):                                                                                                                         
    logger.info("Getting all daily data for %d users from %s." %                                                                                                      
                (len(users), self.daily_table.__tablename__))                                                                                                         
    session = self._get_session()                                                                                                                                     
    query = session.query(self.daily_table).filter(self.daily_table.user.in_(users))                                                                                  
    df = pd.read_sql(query.statement, query.session.bind)                                                                                                             
    session.close()                                                                                                                                                   
    logger.info("Daily data query complete, %d rows of data returned." % df.shape[0])                                                                                 
    return df 

,这里是生成的日志的一部分:

2018-11-14 19:12:19 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 0 of n_users=104 read df of shape (319074, 4) in 5.26866698265 max so far 5.26866698265
2018-11-14 19:12:19 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:22 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 320980 rows of data returned.
2018-11-14 19:12:22 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 1 of n_users=104 read df of shape (320980, 4) in 2.64458298683 max so far 5.26866698265
2018-11-14 19:12:22 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:24 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 317565 rows of data returned.
2018-11-14 19:12:24 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 2 of n_users=104 read df of shape (317565, 4) in 2.48706793785 max so far 5.26866698265
2018-11-14 19:12:24 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:26 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 317662 rows of data returned.
2018-11-14 19:12:26 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 3 of n_users=104 read df of shape (317662, 4) in 2.27176904678 max so far 5.26866698265
2018-11-14 19:12:26 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:29 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 319764 rows of data returned.
2018-11-14 19:12:29 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 4 of n_users=104 read df of shape (319764, 4) in 2.42617821693 max so far 5.26866698265
2018-11-14 19:12:29 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:31 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 314175 rows of data returned.
2018-11-14 19:12:31 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 5 of n_users=104 read df of shape (314175, 4) in 2.26107311249 max so far 5.26866698265
2018-11-14 19:12:31 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:33 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 308365 rows of data returned.
2018-11-14 19:12:33 [INFO] corvil_mlcne.user_recognition.run_ur: for bin 6 of n_users=104 read df of shape (308365, 4) in 2.14715003967 max so far 5.26866698265
2018-11-14 19:12:33 [INFO] corvil_mlcne.user_recognition.database_tools: Getting all daily data for 104 users from daily_user_website.
2018-11-14 19:12:35 [INFO] corvil_mlcne.user_recognition.database_tools: Daily data query complete, 312768 rows of data returned.
2018

尽管通话非常相似,但与以下通话相比,第一个通话所花的费用要多于两倍。这是为什么?有没有一种方法可以使第一个查询更快?

0 个答案:

没有答案