我有一个称为权重的数据框:
| person | age | weight_at_time_1 | weight_at_time_2 |
| Joe | 23 | 280 | 240 |
| Mary | 19 | 111 | 90 |
| Tom | 34 | 150 | 100 |
我想找出最大的减肥效果(实际上,重量差异最大),并找出该weight_at_time_1和weight_at_time_2产生了什么结果,以了解减肥的重要性。以及丢失它的人的名字。
weights['delta_weight] = weights['weight_at_time_2'] - ['weight_at_time_1]
weights['delta_weight'].min()
这告诉我,最大的负变化(最大失重)是-50。
我想报告产生这个min()的weight_at_time_1和weight_at_time_2。 有没有办法为找到min()的行检索索引?还是我必须遍历DataFrame并跟踪它?
答案 0 :(得分:2)
这是使用idxmin
df.loc[[(df.weight_at_time_1-df.weight_at_time_2).idxmin()],:]
person age weight_at_time_1 weight_at_time_2
1 Mary 19 111 90
答案 1 :(得分:2)
如果您有多个最大/最小,也可以使用此:
delta = df.weight_at_time_2 - df.weight_at_time_1
df.loc[delta == delta.min()]
回答您的评论:
In [3]: delta = df.weight_at_time_2 - df.weight_at_time_1
In [4]: bool_idx = delta == delta.min()
# In this way, we are actually using the *Boolean indexing*,
# a boolean vectors to filter the data out of a DataFrame
In [5]: bool_idx
Out[5]:
0 False
1 False
2 True
dtype: bool
# These two lines are equivalent, the result is a DataFrame,
# contains all the rows that match the True/False in the
# same position of `bool_idx`
# In [6]: df.loc[bool_idx]
In [6]: df.loc[bool_idx, :]
Out[6]:
person age weight_at_time_1 weight_at_time_2
2 Tom 34 150 100
# To specify the column label, we can get a Series out the
# filtered DataFrame
In [7]: df.loc[bool_idx, 'person']
Out[7]:
2 Tom
Name: person, dtype: object
# To drop the Series data structure
# - use `.values` property to get a `numpy.ndarray`
# - use `.to_list()` method to get a list
In [8]: df.loc[bool_idx, 'person'].values
Out[8]: array(['Tom'], dtype=object)
In [9]: df.loc[bool_idx, 'person'].to_list()
Out[9]: ['Tom']
# Now, at this time I think you must know many ways
# to get only a string 'Tom' out of above results :)
顺便说一句,@ WeNYoBen最好的答案是Selection By Label的方式,而这个答案就是Selection By Boolean Indexing的方式。
为了更好地理解,我还建议您通读有关熊猫的 Indexing and Selecting Data 的出色官方文档。