我发现当前Panadas的groupby方法有一种非常奇怪的行为。我们来看看以下DataFrame:
df = pd.DataFrame({
'Branch' : 'A A A A A B'.split(),
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,1,1,13,5),
DT.datetime(2013,10,1,20,0),
DT.datetime(2013,10,2,10,0),
DT.datetime(2013,12,2,12,0),
DT.datetime(2013,12,2,14,0),
]})
如果我想使用以下方式按周和分组进行分组:
gr = df.groupby([df.Date.map(lambda d: d.week), 'Branch'])
...使用以下方法查看创建的子帧:
def testgr(df):
print df
gr.apply(testgr)
我在其他只出现一次的组中得到两个第一个(跟随)组:
Branch Buyer Date Quantity
0 A Carl 2013-01-01 13:00:00 1
1 A Mark 2013-01-01 13:05:00 3
我在这里错过了什么吗?
非常感谢
安迪
答案 0 :(得分:0)
应用遍历数据框中的每个项目。
更好的方法是使用组值进行打印:
In [31]: for v in g.groups.values(): print (df.iloc[v])
Branch Buyer Date Quantity week
0 A Carl 2013-01-01 13:00:00 1 1
1 A Mark 2013-01-01 13:05:00 3 1
Branch Buyer Date Quantity week
2 A Carl 2013-10-01 20:00:00 5 40
3 A Joe 2013-10-02 10:00:00 8 40
Branch Buyer Date Quantity week
4 A Joe 2013-12-02 12:00:00 9 49
Branch Buyer Date Quantity week
5 B Carl 2013-12-02 14:00:00 3 49
比较:
In [32]: g.apply(testgr)
Branch Buyer Date Quantity week
0 A Carl 2013-01-01 13:00:00 1 1
1 A Mark 2013-01-01 13:05:00 3 1
Branch Buyer Date Quantity week
0 A Carl 2013-01-01 13:00:00 1 1 <- this dataframe is printed twice
1 A Mark 2013-01-01 13:05:00 3 1
Branch Buyer Date Quantity week
2 A Carl 2013-10-01 20:00:00 5 40 <- this one isn't...
3 A Joe 2013-10-02 10:00:00 8 40
Branch Buyer Date Quantity week
4 A Joe 2013-12-02 12:00:00 9 49
Branch Buyer Date Quantity week
5 B Carl 2013-12-02 14:00:00 3 49
Out[32]:
Date Branch
1 A None
40 A None
49 A None
B None
dtype: object