我在这样的pandas中有一个数据框:
id some_type some_date some_data
0 1 A 19/12/1995 X
1 2 A 10/04/1997 Y
2 2 B 05/03/2013 Z
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
5 3 A 01/07/1998 M
6 3 B 09/08/2009 N
我需要id的每个值,最大值为some_type和some_date的行而不删除some_data的任何值。
换句话说,我需要的是以下内容:
id some_type some_date some_data
0 1 A 19/12/1995 X
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
6 3 B 09/08/2009 N
答案 0 :(得分:2)
您可以使用sort_values
,groupby
和apply
来保留最后一个值为some_type和some_date的行:
df_output = (df.sort_values(by=['some_type','some_date']).groupby('id')
.apply(lambda df_g: df_g[(df_g['some_type'] == df_g['some_type'].iloc[-1]) &
(df_g['some_date'] == df_g['some_date'].iloc[-1])])
.reset_index(0,drop=True))
,输出为:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
编辑:如果您不关心索引,也可以使用merge
:
#first get the last one once sorting
df_last = df.sort_values(['some_type','some_date']).groupby('id')['some_type','some_date'].last()
# now merge with inner to keep the one you want
df_output = df.merge(df_last ,how='inner')
除了索引
,你将获得相同的结果答案 1 :(得分:2)
使用max()
和df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
创建一个掩码并应用。但首先转换为datetime:
import pandas as pd
text = '''\
id some_type some_date some_data
1 A 19/12/1995 X
2 A 10/04/1997 Y
2 B 05/03/2013 Z
2 B 09/05/2017 W
2 B 09/05/2017 R
3 A 01/07/1998 M
3 B 09/08/2009 N'''
fileobj = pd.compat.StringIO(text)
df = pd.read_csv(fileobj, sep='\s+')
df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
print(df)
完整示例:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
返回:
Public Sub test()
Dim a As Object, b As Object, i As Long
Set a = CreateObject("System.Collections.Queue")
a.Enqueue "D"
a.Enqueue "E"
Set b = CreateObject("System.Collections.ArrayList")
With b
.Add "A"
.Add "B"
.Add "C"
.InsertRange 1, a
End With
For i = 0 To b.count - 1
MsgBox b(i)
Next i
End Sub