Question

dataFrame看起来像这样：在给定日期的人名和体重。

  Name   date      w
1 Mike 2019-01-21 89.1
2 Mike 2018-11-12 88.1
3 Mike 2018-03-14 87.2
4 Hans 2019-03-21 66.5
5 Hans 2018-03-12 57.4
6 Hans 2017-04-21 55.3
7 Hans 2016-10-12 nan

我想选择Hans上一次登录体重的时间。所以答案应该是

4 Hans 2019-03-21 66.5

这是我成功完成的事情：

# select Hans data that don't have nans
cond = ( data['Name'] == 'Hans' )
a = data.loc[ cond ] 
a = a.dropna()       

# get the index of the most recent weight
b = d['date'].str.split('-', expand=True) # split the date to get the year

现在b看起来像这样

print(b)
#4 2019 03 21
#5 2018 03 12
#6 2017 04 21

如何使用index=4提取行，然后获得权重？

我不能使用idxmax，因为df不是floats而是str。

Answer 1

您不能使用argmax，但是一种解决方法是将NumPy的iloc与df2 = df.query('Name == "Hans"') # older versions # df2.iloc[[df['date'].values.argmax()]] # >=0.24 df2.iloc[[df['date'].to_numpy().argmax()]] Name date w 4 Hans 2019-03-21 66.5一起使用：

to_datetime

另一个技巧是使用idxmax将日期转换为整数。然后，您可以照常使用loc和df2.loc[[pd.to_datetime(df2['date']).astype(int).idxmax()]] Name date w 4 Hans 2019-03-21 66.5。

GroupBy.idxmax

要为每个人执行此操作，请使用df.iloc[pd.to_datetime(df.date).astype(int).groupby(df['Name']).idxmax().values] Name date w 5 Hans 2018-03-12 57.4 2 Mike 2018-11-12 88.1：

{{1}}

根据列python的最大值选择一行

1 个答案: