使用多个条件对熊猫进行排序和重新分组

时间:2020-10-03 15:16:46

标签: python pandas

这是示例数据-

Product     Type        Name    Time        Value
Product a   Medicare    CVS     2018-10-05  10
Product a   Medicare    Cigna   2018-10-05  20
Product a   Medicare    United  2018-10-05  30
Product a   Medicare    Humana  2018-10-05  40
Product a   Medicare    Centene 2018-10-05  50
Product a   Comm        CVS     2018-10-05  20
Product a   Comm        Cigna   2018-10-05  30
Product a   Comm        United  2018-10-05  40
Product a   Comm        Humana  2018-10-05  50
Product a   Comm        Centene 2018-10-05  60
Product a   Medicare    CVS     2019-10-03  30
Product a   Medicare    Cigna   2019-10-03  20
Product a   Medicare    United  2019-10-03  10
Product a   Medicare    Humana  2019-10-03  5
Product a   Medicare    Centene 2019-10-03  12
Product a   Comm        CVS     2019-10-03  87
Product a   Comm        Cigna   2019-10-03  43
Product a   Comm        United  2019-10-03  50
Product a   Comm        Humana  2019-10-03  30
Product a   Comm        Centene 2019-10-03  90

首先,我需要在“时间”中找到最近的一周。

在上表中是2019-10-03。

那周,我需要为每个“类型”按值对前2个“名称”进行排序/查找。

然后,我需要在下面创建一个像这样的数据框-

2019年10月3日当周,``医疗保险''的前2个``名称''是CVS和Cigna。 在2019-10-03星期中,“ Comm”的前2个“名称”是Centene和CVS。

Product    Type         Name    Time       Value
Product a   Medicare    CVS     2018-10-05  10
Product a   Medicare    Cigna   2018-10-05  20
Product a   Comm        Centene 2018-10-05  60
Product a   Comm        CVS     2018-10-05  20
Product a   Medicare    CVS     2019-10-03  30
Product a   Medicare    Cigna   2019-10-03  20
Product a   Comm        Centene 2019-10-03  90
Product a   Comm        CVS     2019-10-03  87


2 个答案:

答案 0 :(得分:1)

IIUC,首先对数据帧进行排序,然后分组并使用head:

df.sort_values('Value', ascending=False)\
  .groupby(['Product', 'Type', 'Time'])\
  .head(2)\
  .sort_index()

输出:

      Product      Type     Name        Time  Value
3   Product a  Medicare   Humana  2018-10-05     40
4   Product a  Medicare  Centene  2018-10-05     50
8   Product a      Comm   Humana  2018-10-05     50
9   Product a      Comm  Centene  2018-10-05     60
10  Product a  Medicare      CVS  2019-10-03     30
11  Product a  Medicare    Cigna  2019-10-03     20
15  Product a      Comm      CVS  2019-10-03     87
19  Product a      Comm  Centene  2019-10-03     90

答案 1 :(得分:1)

首先过滤最新日期时间的过滤器ProductTypeName组合,然后使用merge过滤所有日期时间的过滤器组合:

df['Time'] = pd.to_datetime(df['Time'])

df1= (df[df['Time'].eq(df['Time'].max())]
      .sort_values('Value', ascending=False)\
      .groupby(['Product', 'Type'])\
      .head(2))
print (df1)
      Product      Type     Name       Time  Value
19  Product a      Comm  Centene 2019-10-03     90
15  Product a      Comm      CVS 2019-10-03     87
10  Product a  Medicare      CVS 2019-10-03     30
11  Product a  Medicare    Cigna 2019-10-03     20

df = (df.merge(df1[['Product','Type', 'Name']])
        .sort_values(['Product','Time','Type','Value'], 
                     ascending=[True, True,True, False]))
print (df)
     Product      Type     Name       Time  Value
6  Product a      Comm  Centene 2018-10-05     60
4  Product a      Comm      CVS 2018-10-05     20
2  Product a  Medicare    Cigna 2018-10-05     20
0  Product a  Medicare      CVS 2018-10-05     10
7  Product a      Comm  Centene 2019-10-03     90
5  Product a      Comm      CVS 2019-10-03     87
1  Product a  Medicare      CVS 2019-10-03     30
3  Product a  Medicare    Cigna 2019-10-03     20