熊猫:具有多个分组级别的滚动总和

时间:2017-11-02 20:19:57

标签: python pandas

我试图在多个分组级别之后获得数据帧的滚动总和:

import pandas as pd
import numpy as np
year_vec = np.arange(2000, 2005)
month_vec = np.arange(1, 4)
soln_list = []
firmList = [61, 62, 63]
firmId = []
year_month = []
year = []
month = []
for firmIndex in range(0, len(firmList)):
    for yearIndex in range(0, len(year_vec)):
        for monthIndex in range(0, len(month_vec)):
            soln_list.append("soln_%s_%s_%s" % (firmList[firmIndex], year_vec[yearIndex], month_vec[monthIndex]))
            firmId.append(firmList[firmIndex])
            month.append(month_vec[monthIndex])
            year.append(year_vec[yearIndex])
            year_month.append("%s_%s" % (year_vec[yearIndex], month_vec[monthIndex]))

df = pd.DataFrame({'firmId': firmId, 'year': year, 'month': month, 'year_month' : year_month,
                   'soln_vars': soln_list})
df = df.set_index(["firmId", "year_month"])

结果数据框如下所示:

                      month       soln_vars  year
firmId year_month                             
61     2000_1          1  soln_61_2000_1  2000
       2000_2          2  soln_61_2000_2  2000
       2000_3          3  soln_61_2000_3  2000
       2001_1          1  soln_61_2001_1  2001
       2001_2          2  soln_61_2001_2  2001
       2001_3          3  soln_61_2001_3  2001
       2002_1          1  soln_61_2002_1  2002
        ...                   ...         ...

在这一点上,我想要每2年soln_vars的总和,每个月每个公司。为此,我首先按firmIdyear进行分组,然后总结:

  df = df.groupby([df.index.get_level_values(0), "year"])["soln_vars"].sum()

此操作为我提供了每家公司每年soln_vars的总和:

firmId  year
61      2000    soln_61_2000_1soln_61_2000_2soln_61_2000_3
        2001    soln_61_2001_1soln_61_2001_2soln_61_2001_3
        2002    soln_61_2002_1soln_61_2002_2soln_61_2002_3
        2003    soln_61_2003_1soln_61_2003_2soln_61_2003_3
        2004    soln_61_2004_1soln_61_2004_2soln_61_2004_3
62      2000    soln_62_2000_1soln_62_2000_2soln_62_2000_3
        2001    soln_62_2001_1soln_62_2001_2soln_62_2001_3
        ...                    ...

在我的应用程序中,解决方案变量由另一个库提供,导致数学表达式:soln_61_2000_1 +soln_61_2000_2 + soln_61_2000_3 - 为简单起见,我在这里使用字符串。 然后按firmId分组并应用滚动总和:

  df = df.groupby(level=0, group_keys=False).rolling(2).sum()

不会更改df。在澄清这一点时,我们对此表示赞赏。

0 个答案:

没有答案