熊猫:重复的群组响应

时间:2013-06-07 09:21:01

标签: python group-by pandas

我发现当前Panadas的groupby方法有一种非常奇怪的行为。我们来看看以下DataFrame:

df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
    DT.datetime(2013,1,1,13,0),
    DT.datetime(2013,1,1,13,5),
    DT.datetime(2013,10,1,20,0),
    DT.datetime(2013,10,2,10,0),
    DT.datetime(2013,12,2,12,0),                                      
    DT.datetime(2013,12,2,14,0),
    ]})

如果我想使用以下方式按周和分组进行分组:

gr = df.groupby([df.Date.map(lambda d: d.week), 'Branch'])

...使用以下方法查看创建的子帧:

def testgr(df):
    print df
gr.apply(testgr)

我在其他只出现一次的组中得到两个第一个(跟随)组:

  Branch Buyer                Date  Quantity
0      A  Carl 2013-01-01 13:00:00         1
1      A  Mark 2013-01-01 13:05:00         3

我在这里错过了什么吗?

非常感谢

安迪

1 个答案:

答案 0 :(得分:0)

应用遍历数据框中的每个项目。

更好的方法是使用组值进行打印:

In [31]: for v in g.groups.values(): print (df.iloc[v])
  Branch Buyer                Date  Quantity  week
0      A  Carl 2013-01-01 13:00:00         1     1
1      A  Mark 2013-01-01 13:05:00         3     1
  Branch Buyer                Date  Quantity  week
2      A  Carl 2013-10-01 20:00:00         5    40
3      A   Joe 2013-10-02 10:00:00         8    40
  Branch Buyer                Date  Quantity  week
4      A   Joe 2013-12-02 12:00:00         9    49
  Branch Buyer                Date  Quantity  week
5      B  Carl 2013-12-02 14:00:00         3    49

比较:

In [32]: g.apply(testgr)
  Branch Buyer                Date  Quantity  week
0      A  Carl 2013-01-01 13:00:00         1     1
1      A  Mark 2013-01-01 13:05:00         3     1
  Branch Buyer                Date  Quantity  week
0      A  Carl 2013-01-01 13:00:00         1     1  <- this dataframe is printed twice
1      A  Mark 2013-01-01 13:05:00         3     1
  Branch Buyer                Date  Quantity  week
2      A  Carl 2013-10-01 20:00:00         5    40  <- this one isn't...
3      A   Joe 2013-10-02 10:00:00         8    40
  Branch Buyer                Date  Quantity  week
4      A   Joe 2013-12-02 12:00:00         9    49
  Branch Buyer                Date  Quantity  week
5      B  Carl 2013-12-02 14:00:00         3    49
Out[32]:
Date  Branch
1     A         None
40    A         None
49    A         None
      B         None
dtype: object