Question

我的df看起来像：

import pandas as pd
import numpy as np
d = {'Hours':np.arange(12, 97, 12),
     'Average':np.random.random(8),
     'Count':[500, 250, 125, 75, 60, 25, 5, 15]}
df = pd.DataFrame(d)

此df每行的个案数减少。在计数降到某个阈值以下之后，我想放弃其余部分，例如在＆lt;达到了10个案例阈值。

开始：

    Average     Count   Hours
0   0.560671    500     12
1   0.743811    250     24
2   0.953704    125     36
3   0.313850    75      48
4   0.640588    60      60
5   0.591149    25      72
6   0.302894    5       84
7   0.418912    15      96

完成（删除第6行后的所有内容）：

    Average     Count   Hours
0   0.560671    500     12
1   0.743811    250     24
2   0.953704    125     36
3   0.313850    75      48
4   0.640588    60      60
5   0.591149    25      72

Answer 1

我们可以使用从布尔索引生成的索引，并使用iloc对df进行切片：

In [58]:

df.iloc[:df[df.Count < 10].index[0]]
Out[58]:
    Average  Count  Hours
0  0.183016    500     12
1  0.046221    250     24
2  0.687945    125     36
3  0.387634     75     48
4  0.167491     60     60
5  0.660325     25     72

只是打破这里发生的事情

In [54]:
# use a boolean mask to index into the df
df[df.Count < 10]
Out[54]:
    Average  Count  Hours
6  0.244839      5     84

In [56]:
# we want the index and can subscript the first element using [0]
df[df.Count < 10].index
Out[56]:
Int64Index([6], dtype='int64')

删除符合pandas dataframe阈值的第一行下方的行

1 个答案: