这是示例数据-
Product Type Name Time Value
Product a Medicare CVS 2018-10-05 10
Product a Medicare Cigna 2018-10-05 20
Product a Medicare United 2018-10-05 30
Product a Medicare Humana 2018-10-05 40
Product a Medicare Centene 2018-10-05 50
Product a Comm CVS 2018-10-05 20
Product a Comm Cigna 2018-10-05 30
Product a Comm United 2018-10-05 40
Product a Comm Humana 2018-10-05 50
Product a Comm Centene 2018-10-05 60
Product a Medicare CVS 2019-10-03 30
Product a Medicare Cigna 2019-10-03 20
Product a Medicare United 2019-10-03 10
Product a Medicare Humana 2019-10-03 5
Product a Medicare Centene 2019-10-03 12
Product a Comm CVS 2019-10-03 87
Product a Comm Cigna 2019-10-03 43
Product a Comm United 2019-10-03 50
Product a Comm Humana 2019-10-03 30
Product a Comm Centene 2019-10-03 90
首先,我需要在“时间”中找到最近的一周。
在上表中是2019-10-03。
那周,我需要为每个“类型”按值对前2个“名称”进行排序/查找。
然后,我需要在下面创建一个像这样的数据框-
2019年10月3日当周,``医疗保险''的前2个``名称''是CVS和Cigna。 在2019-10-03星期中,“ Comm”的前2个“名称”是Centene和CVS。
Product Type Name Time Value
Product a Medicare CVS 2018-10-05 10
Product a Medicare Cigna 2018-10-05 20
Product a Comm Centene 2018-10-05 60
Product a Comm CVS 2018-10-05 20
Product a Medicare CVS 2019-10-03 30
Product a Medicare Cigna 2019-10-03 20
Product a Comm Centene 2019-10-03 90
Product a Comm CVS 2019-10-03 87
答案 0 :(得分:1)
IIUC,首先对数据帧进行排序,然后分组并使用head:
df.sort_values('Value', ascending=False)\
.groupby(['Product', 'Type', 'Time'])\
.head(2)\
.sort_index()
输出:
Product Type Name Time Value
3 Product a Medicare Humana 2018-10-05 40
4 Product a Medicare Centene 2018-10-05 50
8 Product a Comm Humana 2018-10-05 50
9 Product a Comm Centene 2018-10-05 60
10 Product a Medicare CVS 2019-10-03 30
11 Product a Medicare Cigna 2019-10-03 20
15 Product a Comm CVS 2019-10-03 87
19 Product a Comm Centene 2019-10-03 90
答案 1 :(得分:1)
首先过滤最新日期时间的过滤器Product
,Type
和Name
组合,然后使用merge
过滤所有日期时间的过滤器组合:
df['Time'] = pd.to_datetime(df['Time'])
df1= (df[df['Time'].eq(df['Time'].max())]
.sort_values('Value', ascending=False)\
.groupby(['Product', 'Type'])\
.head(2))
print (df1)
Product Type Name Time Value
19 Product a Comm Centene 2019-10-03 90
15 Product a Comm CVS 2019-10-03 87
10 Product a Medicare CVS 2019-10-03 30
11 Product a Medicare Cigna 2019-10-03 20
df = (df.merge(df1[['Product','Type', 'Name']])
.sort_values(['Product','Time','Type','Value'],
ascending=[True, True,True, False]))
print (df)
Product Type Name Time Value
6 Product a Comm Centene 2018-10-05 60
4 Product a Comm CVS 2018-10-05 20
2 Product a Medicare Cigna 2018-10-05 20
0 Product a Medicare CVS 2018-10-05 10
7 Product a Comm Centene 2019-10-03 90
5 Product a Comm CVS 2019-10-03 87
1 Product a Medicare CVS 2019-10-03 30
3 Product a Medicare Cigna 2019-10-03 20