动态查找pandas中每列中的最大n值

时间:2018-06-14 15:45:38

标签: python pandas

我有一个时间序列数据集,如下所示:

Date        Newspaper   City1    City2   Region1Total   City3   City4  Region2Total
2017-12-01  NewsPaper1  231563   8696    240259         21072   8998   30070
2017-12-01  NewsPaper2  173009   12180   185189         28910   5550   34460
2017-12-01  NewsPaper3  40511    4600    45111          5040    3330   8370
2017-12-01  NewsPaper4  37770    2980    40750          6520    1880   8400
2017-12-01  NewsPaper5  5176     900     6076           1790    5000   6790
2017-12-01  NewsPaper6  137650   8025    145675         25300  11000   36300
2017-12-01  Total       637547   38201   675748         91032  36558   127590

2018-01-01  NewsPaper1  231295   8391    239686         8790   21176   29966
2018-01-01  NewsPaper2  169937   12130   182067         7890   28850   36740
2018-01-01  NewsPaper3  40453    4570    45023          4750   5055    9800
2018-01-01  NewsPaper4  37766    2970    40736          2500   6540    9040
2018-01-01  NewsPaper5  5136     900     6036           5600   1795    7365
2018-01-01  NewsPaper6  137990   8010    146000         14500  25330   39830
2018-01-01  Total       633919   37786   671705         44980  91141   136121 

我试图在此数据框的每一列中找到最大n个值。我尝试了以下方法

somelist = []
data = pd.read_excel('newspaper.csv')
data.index = pd.to_datetime(data['Date'], errors='coerce')
last_month = data.loc[data.index[-1]] # i am considering only the previous month(latest month in the dataframe)
last_month.set_index('Newspaper', inplace = True)
for city in last_month.iloc[:, 2: ]:
    top_3 = last_month[city].nlargest(4)[1: ] #highest will be total but we should skip it
    somelist.append(top_3)
print(somelist)

这产生的结果为pandas系列,下面列的名称为:

    [Newspaper
    Newspaper1    231295
    Newspaper2    169937
    Newspaper6    137990
    Name: City1, dtype: float64, Newspaper
    Newspaper2    12130.0
    Newspaper1     8391.0
    Newspaper6     8010.0
    Name: City2, dtype: float64, Newspaper
    Newspaper1    240259
    Newspaper2    185189
    Newspaper6    145675
    Name: Region1Total, dtype: float64, Newspaper
    Newspaper6    14500.0
    Newspaper1     8790.0
    Newspaper2     7890.0
    Name: City3, dtype: float64, Newspaper
    Newspaper2    28850.0
    Newspaper6    25330.0
    Newspaper1    21176.0
    Name: City4, dtype: float64, Newspaper
    Newspaper6    36300
    Newspaper2    34460
    Newspaper1    34460
    Name: Region2Total, dtype: float64, Newspaper]

我想要的是每个城市和地区排名前三的报纸以及按降序排列的销售数字。我还希望在显示前3个结果之前打印城市/地区的名称。

预期输出是一个列表或类似下面的系列:

Newspaper     City1
Newspaper1    231295
Newspaper2    169937
Newspaper6    137990

Newspaper     City2
Newspaper2    12130.0
Newspaper1     8391.0
Newspaper6     8010.0

Newspaper     Region1Total
Newspaper1    240259
Newspaper2    185189
Newspaper6    145675

Newspaper     City3
Newspaper6    14500.0
Newspaper1     8790.0
Newspaper2     7890.0

Newspaper     City4
Newspaper2    28850.0
Newspaper6    25330.0
Newspaper1    21176.0

Newspaper     Region2Total
Newspaper6    36300
Newspaper2    34460
Newspaper1    34460

另外,如果我想跳过这些地区,只考虑一下这些城市,那么我该如何做呢? 任何帮助,将不胜感激。先感谢您。

0 个答案:

没有答案