Python:减少for循环的运行时间

时间:2019-05-21 10:36:01

标签: python pandas dataframe for-loop

我想计算几个国家的APRU。

country_list = ['us','gb','ca','id']

count = {}
for i in country_list:
    count[i] = df_day_country[df_day_country.isin([i])]
    count[i+'_reverse'] = count[i].iloc[::-1]
    for j in range(1,len(count[i+'_reverse'])): 
        count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
    for k in range(1,len(count[i])): 
        count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()

    count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100

然后,我将创建4个数据框:df_us,df_gb,df_ca,df_id,显示每个国家的APRU。

但是数据集的大小很大。国家列表变大后,运行时间非常慢。那么有什么方法可以减少运行时间?

2 个答案:

答案 0 :(得分:0)

考虑使用numba

您的代码因此成为

from numba import njit

country_list = ['us','gb','ca','id']

@njit
def count(country_list):
  count = {}
  for i in country_list:
      count[i] = df_day_country[df_day_country.isin([i])]
      count[i+'_reverse'] = count[i].iloc[::-1]
      for j in range(1,len(count[i+'_reverse'])): 
          count[i+'_reverse']['count'].iloc[j] = count[i+'_reverse']['count'][j-1:j+1].sum()
      for k in range(1,len(count[i])): 
          count[i][revenue_sum].iloc[k] = count[i][revenue_sum][k-1:k+1].sum()

      count[i]['APRU'] = count[i][revenue_sum] / count[i]['count'][0]/100
  return count

Numba使python循环快得多,并且正在集成到诸如scipy之类的功能更强大的python库中。一定要看看这个。

答案 1 :(得分:0)

IIUC,从您的代码和变量名看来,您正在尝试计算平均值:

# toy data set:
country_list = ['us','gb']

np.random.seed(1)
datalen=10
df_day_country = pd.DataFrame({'country': np.random.choice(country_list, datalen),
                               'count': np.random.randint(0,100, datalen),
                               'revenue_sum': np.random.uniform(0,100,datalen)})


df_day_country['APRU'] =  (df_day_country.groupby('country',group_keys=False)
                            .apply(lambda x: x['revenue_sum']/x['count'].sum())
                          )

输出:

+----------+--------+--------------+------------+----------+
| country  | count  | revenue_sum  |    APRU    |          |
+----------+--------+--------------+------------+----------+
|       0  | gb     |          16  | 20.445225  | 0.150333 |
|       1  | gb     |           1  | 87.811744  | 0.645675 |
|       2  | us     |          76  | 2.738759   | 0.011856 |
|       3  | us     |          71  | 67.046751  | 0.290246 |
|       4  | gb     |           6  | 41.730480  | 0.306842 |
|       5  | gb     |          25  | 55.868983  | 0.410801 |
|       6  | gb     |          50  | 14.038694  | 0.103226 |
|       7  | gb     |          20  | 19.810149  | 0.145663 |
|       8  | gb     |          18  | 80.074457  | 0.588783 |
|       9  | us     |          84  | 96.826158  | 0.419161 |
+----------+--------+--------------+------------+----------+