Question

我有一个称为权重的数据框：

| person | age | weight_at_time_1 | weight_at_time_2 |
| Joe    | 23  | 280              | 240              |  
| Mary   | 19  | 111              | 90               |    
| Tom    | 34  | 150              | 100              |

我想找出最大的减肥效果（实际上，重量差异最大），并找出该weight_at_time_1和weight_at_time_2产生了什么结果，以了解减肥的重要性。以及丢失它的人的名字。

weights['delta_weight] = weights['weight_at_time_2'] - ['weight_at_time_1]
weights['delta_weight'].min()

这告诉我，最大的负变化（最大失重）是-50。

我想报告产生这个min（）的weight_at_time_1和weight_at_time_2。有没有办法为找到min（）的行检索索引？还是我必须遍历DataFrame并跟踪它？

Answer 1

这是使用idxmin

的一种方法

df.loc[[(df.weight_at_time_1-df.weight_at_time_2).idxmin()],:]
  person  age  weight_at_time_1  weight_at_time_2
1   Mary   19               111                90

Answer 2

如果您有多个最大/最小，也可以使用此：

delta = df.weight_at_time_2 - df.weight_at_time_1
df.loc[delta == delta.min()]

回答您的评论：

In [3]: delta = df.weight_at_time_2 - df.weight_at_time_1

In [4]: bool_idx = delta == delta.min()

# In this way, we are actually using the *Boolean indexing*,
# a boolean vectors to filter the data out of a DataFrame
In [5]: bool_idx
Out[5]:
0    False
1    False
2     True
dtype: bool

# These two lines are equivalent, the result is a DataFrame,
# contains all the rows that match the True/False in the
# same position of `bool_idx`
# In [6]: df.loc[bool_idx]
In [6]: df.loc[bool_idx, :]
Out[6]:
  person  age  weight_at_time_1  weight_at_time_2
2    Tom   34               150               100

# To specify the column label, we can get a Series out the
# filtered DataFrame
In [7]: df.loc[bool_idx, 'person']
Out[7]:
2    Tom
Name: person, dtype: object

# To drop the Series data structure
#    - use `.values` property to get a `numpy.ndarray`
#    - use `.to_list()` method to get a list
In [8]: df.loc[bool_idx, 'person'].values
Out[8]: array(['Tom'], dtype=object)

In [9]: df.loc[bool_idx, 'person'].to_list()
Out[9]: ['Tom']

# Now, at this time I think you must know many ways
# to get only a string 'Tom' out of above results :)

顺便说一句，@ WeNYoBen最好的答案是Selection By Label的方式，而这个答案就是Selection By Boolean Indexing的方式。

为了更好地理解，我还建议您通读有关熊猫的 Indexing and Selecting Data 的出色官方文档。

查找数据框的哪一行

2 个答案: