Consider the following DataFrame:
records = [{'item': 'Widget A', 'quantity': 50, 'revenue': 25.0, 'trandate': '2016-3-24'},
{'item': 'Widget B', 'quantity': 6, 'revenue': 72.0, 'trandate': '2016-3-28'},
{'item': 'Widget C', 'quantity': 5, 'revenue': 75.0, 'trandate': '2016-3-28'},
{'item': 'Widget A', 'quantity': 168, 'revenue': 84.0, 'trandate': '2016-3-29'},
{'item': 'Widget B', 'quantity': 6, 'revenue': 84.0, 'trandate': '2016-3-29'}]
indices = [487, 488, 493, 495, 497]
df = pd.DataFrame(records, index=indices)
yielding
id item quantity revenue trandate
487 Widget A 50 25.0 2016-3-24
488 Widget B 6 72.0 2016-3-28
493 Widget C 6 75.0 2016-3-28
495 Widget A 6 84.0 2016-3-29
497 Widget B 6 84.0 2016-3-29
I need to split this DataFrame into two complementary sets:
A DataFrame that contains the first transactions for each item
:
id item quantity revenue trandate
487 Widget A 50 25.0 2016-3-24
488 Widget B 6 72.0 2016-3-28
493 Widget C 6 75.0 2016-3-28
A DataFrame that excludes the first transactions for each item
:
id item quantity revenue trandate
495 Widget A 6 84.0 2016-3-29
497 Widget B 6 84.0 2016-3-29
I would like to filter df
by a GroupedBy object, but I can't get df
's indices to show up after I groupby:
gb = df.groupby('item')
>>> gb.groups
# {'Widget A': [487, 495], 'Widget B': [488, 497], 'Widget C': [493]}
>>> gb['trandate'].min()
item
Widget A 2016-3-24
Widget B 2016-3-28
Widget C 2016-3-28
Can I use GroupBy to yield a DataFrame like:
id item
487 Widget A 2016-3-24
488 Widget B 2016-3-28
493 Widget C 2016-3-28
答案 0 :(得分:3)
I think you need filter by mask
created by cumcount
:
print (df.groupby('item').cumcount())
487 0
488 0
493 0
495 1
497 1
dtype: int64
print (df[df.groupby('item').cumcount() == 0])
item quantity revenue trandate
487 Widget A 50 25.0 2016-3-24
488 Widget B 6 72.0 2016-3-28
493 Widget C 5 75.0 2016-3-28
print (df[df.groupby('item').cumcount() > 0])
item quantity revenue trandate
495 Widget A 168 84.0 2016-3-29
497 Widget B 6 84.0 2016-3-29