使用此数据
import pandas as pd
df=pd.read_excel(
"https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=True"
)
df["date"] = pd.to_datetime(df['date']) df.head()
我使用此代码按月将ext price
和name
分组了
df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
出:
date name
2014-01-31 Barton LLC 6177.57
Cronin, Oberbrunner and Spencer 1141.75
Frami, Hills and Schmidt 5112.34
Fritsch, Russel and Anderson 15130.77
Halvorson, Crona and Champlin 9997.17
Herman LLC 10749.84
Jerde-Hilpert 11274.33
Kassulke, Ondricka and Metz 7322.83
Keeling LLC 6847.86
Kiehn-Spinka 8097.50
Koepp Ltd 10768.33
Kuhn-Gusikowski 7309.54
Kulas Inc 15398.87
Pollich LLC 1004.22
Purdy-Kunde 4689.37
Sanford and Sons 9544.13
Stokes LLC 5809.34
Trantow-Barrows 14328.26
White-Trantow 13703.77
Will LLC 20953.87
2014-02-28 Barton LLC 12218.03
Cronin, Oberbrunner and Spencer 13976.26
Frami, Hills and Schmidt 4124.53
Fritsch, Russel and Anderson 9595.35
Halvorson, Crona and Champlin 7082.15
Herman LLC 5831.40
Jerde-Hilpert 4088.40
Kassulke, Ondricka and Metz 3061.12
Keeling LLC 3383.45
Kiehn-Spinka 3461.12
...
2014-11-30 Koepp Ltd 4882.27
Kuhn-Gusikowski 7197.89
Kulas Inc 4149.34
Pollich LLC 6334.21
Purdy-Kunde 2376.00
Sanford and Sons 6834.04
Stokes LLC 6158.81
Trantow-Barrows 6550.10
White-Trantow 9544.61
Will LLC 3210.44
2014-12-31 Barton LLC 2772.90
Cronin, Oberbrunner and Spencer 7640.60
Frami, Hills and Schmidt 16249.81
Fritsch, Russel and Anderson 12345.64
Halvorson, Crona and Champlin 2900.51
Herman LLC 4664.54
Jerde-Hilpert 6941.99
Kassulke, Ondricka and Metz 4425.22
Keeling LLC 13247.88
Kiehn-Spinka 17401.28
Koepp Ltd 11791.00
Kuhn-Gusikowski 4959.85
Kulas Inc 6106.38
Pollich LLC 12357.76
Purdy-Kunde 4051.79
Sanford and Sons 2151.48
Stokes LLC 6366.26
Trantow-Barrows 10124.23
White-Trantow 4806.93
Will LLC 12561.21
Name: ext price, Length: 240, dtype: float64
现在,我尝试获取每个name
ext price
(前month
)
我尝试过nlargest(5)
,但是它不起作用
函数head(5)
也不能解决问题
答案 0 :(得分:1)
也许有一种更快的方法,但是您可以按价格排序,按日期分组,获取前5名,然后对索引(也就是日期)进行排序。
s = df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
out = (s.reset_index(level=-1)
.sort_values('ext price', ascending=False)
.groupby('date')
.head(5)
.sort_index()
)
out
# returns:
name ext price
date
2014-01-31 Will LLC 20953.87
2014-01-31 White-Trantow 13703.77
2014-01-31 Trantow-Barrows 14328.26
2014-01-31 Fritsch, Russel and Anderson 15130.77
2014-01-31 Kulas Inc 15398.87
2014-02-28 Will LLC 13613.06
2014-02-28 Kulas Inc 13973.65
2014-02-28 Cronin, Oberbrunner and Spencer 13976.26
2014-02-28 White-Trantow 11783.98
2014-02-28 Barton LLC 12218.03
...
2014-10-31 White-Trantow 19896.85
2014-10-31 Kulas Inc 15754.28
2014-10-31 Kuhn-Gusikowski 10559.77
2014-10-31 Jerde-Hilpert 11401.27
2014-10-31 Fritsch, Russel and Anderson 10686.18
2014-11-30 Keeling LLC 10310.96
2014-11-30 White-Trantow 9544.61
2014-11-30 Kassulke, Ondricka and Metz 8836.82
2014-11-30 Kuhn-Gusikowski 7197.89
2014-11-30 Frami, Hills and Schmidt 7221.02
2014-12-31 Frami, Hills and Schmidt 16249.81
2014-12-31 Pollich LLC 12357.76
2014-12-31 Will LLC 12561.21
2014-12-31 Kiehn-Spinka 17401.28
2014-12-31 Keeling LLC 13247.88
答案 1 :(得分:1)
另一种选择是nlargest
,但可能不会比James的建议快,因为排序和获取head
或tail
应该比nlargest
快:
new = df.groupby([pd.Grouper(key='date', freq='M'), 'name'])['ext price'].sum()
new.groupby(level=0).nlargest(5).sort_index().reset_index(level=1, drop=True).to_frame()
答案 2 :(得分:0)
您可以为此使用idxmax
函数:
df[df.groupby('month')['ext price'].idxmax()]
答案 3 :(得分:0)
我相信这就是您想要的:
df = df.groupby([pd.Grouper(key="date", freq="M"), "name"])["ext price"].sum()
out = df.groupby(level=0).nlargest(5)
out.index = out.index.droplevel()
然后您将获得以下数据:
In [134]: out.head(10)
Out[134]:
date name
2014-01-31 Will LLC 20953.87
Kulas Inc 15398.87
Fritsch, Russel and Anderson 15130.77
Trantow-Barrows 14328.26
White-Trantow 13703.77
2014-02-28 Cronin, Oberbrunner and Spencer 13976.26
Kulas Inc 13973.65
Will LLC 13613.06
Barton LLC 12218.03
White-Trantow 11783.98
dtype: float64