我有一个熊猫数据框,其中包含开始日期和测量日期。
Start Date change Individual measured_date
0 2004-11-23 3341 Bob 2007-07-26
1 2006-06-29 3398 Bob 2007-07-26
2 1997-07-21 2277 Greg 2005-04-21
3 2000-04-11 3380 Nancy 2005-10-14
4 2000-04-11 3380 Nancy 2007-06-28
5 2005-03-29 3115 Nancy 2005-10-14
6 2005-03-29 3115 Nancy 2007-06-28
7 2005-10-15 4294 Nancy 2007-06-28
8 2007-03-16 2163 Nancy 2007-06-28
9 2006-02-18 2299 Jose 2009-12-23
10 2008-11-16 1983 Jose 2009-12-23
11 2009-04-07 2112 Jose 2009-12-23
12 2009-11-14 2036 Jose 2009-12-23
13 2009-11-24 2556 Jose 2009-12-23
我想基于“个人”列选择在开始时间和测量时间之间最短的时间的行。我创建了一个“ diff”列以供查看。例如,应将此数据框过滤为
StartDate change Individual measured_date diff
6/29/2006 3398 Bob 7/26/2007 392 days 00:00:00.000000000
7/21/1997 2277 Greg 4/21/2005 2831 days 00:00:00.000000000
3/16/2007 2163 Nancy 6/28/2007 104 days 00:00:00.000000000
11/24/2009 2556 Jose 12/23/2009 29 days 00:00:00.000000000`
答案 0 :(得分:1)
使用sort_values
,然后我们drop_duplicates
df.sort_values('diff',ascending=False).drop_duplicates('Individual')
答案 1 :(得分:0)
您可以--set appConfigFile=customData/custom.json
并使用groupby
nsmallest