python groupby使用混合了列表或字符串的变量的语法问题

时间:2015-01-12 11:30:24

标签: python syntax pandas group-by

我正在尝试运行包含变量和字符串组合的groupby,以用作分组字段/列。有人可以帮助解决这个问题,这可能需要我花一天的时间来解决。

Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'

以下是有效的:

dfJoinsP2 = dfJoinsP2.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()

但是当我尝试添加一个名为' Period_Number'的额外字段时我收到了错误。

dfJoinsP2 = dfJoinsP2.groupby(Mix,'Period_Number')['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()

1 个答案:

答案 0 :(得分:1)

重新制作并说明您的问题:

In [22]:
# define our cols, create a dummy df
cols = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week','joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h', 'Period Number']
df = pd.DataFrame(columns=cols, data =np.random.randn(5, len(cols)))
df
Out[22]:
   business_unit   isoname  planning_channel  is_tracked  planning_partner  \
0      -0.818644  1.150678         -0.860677   -0.333496         -0.292689   
1       0.476575 -0.018507         -1.917119    0.360656          0.381106   
2       1.187570  1.105363          1.955066    0.154020          1.996389   
3       0.318762  0.962469          0.565538    0.671002         -0.675688   
4      -0.070671 -1.717793         -0.085815    0.089589          0.892412   

       week  joined_subs_cmap  initial_billed_subs  billed_d1  churn_d1  \
0 -0.681875          1.138119            -1.071672   0.409712 -1.066456   
1 -0.235040          0.559950             0.082890  -0.372671  0.804438   
2  1.707340          0.893437             0.316266   1.852508 -2.554488   
3 -2.055322          1.848388            -1.695563  -0.826089 -0.588229   
4 -0.325098          0.827455             0.535827  -0.930963  0.211628   

   churn_24h  Period Number  
0   1.067530       0.377579  
1   0.097042      -1.947681  
2  -0.327243      -1.137146  
3   0.230110       1.470183  
4   1.191042       2.167251  
In [23]:
# what you are trying to do
Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
df.groupby(Mix, 'Period Number')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-dc75b3902303> in <module>()
      1 Mix ='business_unit','isoname','planning_channel','is_tracked','planning_partner','week'
----> 2 df.groupby(Mix, 'Period Number')

C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
   2894         if level is None and by is None:
   2895             raise TypeError("You have to supply one of 'by' and 'level'")
-> 2896         axis = self._get_axis_number(axis)
   2897         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   2898                        sort=sort, group_keys=group_keys, squeeze=squeeze)

C:\WinPython-64bit-3.3.3.2\python-3.3.3.amd64\lib\site-packages\pandas\core\generic.py in _get_axis_number(self, axis)
    294                 pass
    295         raise ValueError('No axis named {0} for object type {1}'
--> 296                          .format(axis, type(self)))
    297 
    298     def _get_axis_name(self, axis):

ValueError: No axis named Period Number for object type <class 'pandas.core.frame.DataFrame'>

所以你得到一个ValueError,因为'Period Number'被解释为axis值,这当然是无效的,而不是你想要的。

这里的另一点是你定义Mix的方式会产生一个元组,如果它是一个列表,那么我们可以追加感兴趣的额外列,一切都会好的:

In [24]:

Mix = ['business_unit','isoname','planning_channel','is_tracked','planning_partner','week']
Mix.append('Period Number')
df.groupby(Mix)['joined_subs_cmap', 'initial_billed_subs', 'billed_d1', 'churn_d1' , 'churn_24h'].sum().reset_index()
Out[24]:
   business_unit   isoname  planning_channel  is_tracked  planning_partner  \
0      -0.818644  1.150678         -0.860677   -0.333496         -0.292689   
1      -0.070671 -1.717793         -0.085815    0.089589          0.892412   
2       0.318762  0.962469          0.565538    0.671002         -0.675688   
3       0.476575 -0.018507         -1.917119    0.360656          0.381106   
4       1.187570  1.105363          1.955066    0.154020          1.996389   

       week  Period Number  joined_subs_cmap  initial_billed_subs  billed_d1  \
0 -0.681875       0.377579          1.138119            -1.071672   0.409712   
1 -0.325098       2.167251          0.827455             0.535827  -0.930963   
2 -2.055322       1.470183          1.848388            -1.695563  -0.826089   
3 -0.235040      -1.947681          0.559950             0.082890  -0.372671   
4  1.707340      -1.137146          0.893437             0.316266   1.852508   

   churn_d1  churn_24h  
0 -1.066456   1.067530  
1  0.211628   1.191042  
2 -0.588229   0.230110  
3  0.804438   0.097042  
4 -2.554488  -0.327243