从Python Datagrame返回连续第6个最大值

时间:2019-11-19 21:46:27

标签: python pandas dataframe

我希望将df中10列中的第六大行值返回到新列,在本例中称为“ 6th_largest”。在整个df中的许多情况下,可能会有不止一行共享第六大价值。不管是一个还是多个,我只需要返回实际的第6个最大值。

这里类似问题中的几个选项没有用,因为它们通常是特定于最大值(我已经能够使用)或仅是第一个和第二个值。


import pandas as pd

#what the actual df might look like

data_actual = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9,10]]

df_actual=pd.DataFrame(data_actual, columns=['1st','2nd','3rd','4th','5th','6th',
                                                 '7th','8th','9th','10th'])

#what I want the df to look like after the calculation, returning the 6th largest value.

data_want = [[0, 1, 2, 3, 5, 5, 6, 7, 8, 9, 5], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 5]]

df_want=pd.DataFrame(data_want, columns=['1st','2nd','3rd','4th','5th','6th',
                                             '7th','8th','9th','10th', '6th Largest'])   

4 个答案:

答案 0 :(得分:2)

使用,排名:

 mov DX, offset VETOR_TELA 



            mov ah,9H
            int 21h

输出:

df_actual['6th Largest'] = df_actual.where(df_actual.rank(axis=1) == 6).dropna(axis=1)

答案 1 :(得分:1)

为此,最简单的方法就是对其进行排序和提取:

# np.sort sort increasingly
df['6th Largest'] = np.sort(df.values, axis=1)[:, 4]

答案 2 :(得分:1)

对数组进行分区,因为您只关心nth元素在正确的位置。如果课程假设您至少具有 n个元素。


np.partition(df.to_numpy(), -6, axis=1)[:, -6]

array([4, 5], dtype=int64)

时间

In [6]: df = pd.DataFrame(np.random.randint(0, 1000, (1000, 1000))

In [7]: %timeit np.sort(df.values, axis=1)[:, -6]
38.4 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit np.partition(df.to_numpy(), -6, axis=1)[:, -6]
8.52 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

答案 3 :(得分:0)

您可以在此处使用apply函数:

data_actual = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]

df_actual=pd.DataFrame(data_actual, columns=['1st','2nd','3rd','4th','5th','6th',
                                                '7th','8th','9th','10th'])


def get_sixth(row):
    row = row.tolist()
    row.sort()
    return row[5]

df_actual["6th Largest"] = df_actual.apply(get_sixth, axis=1) # axis=1 necessary since you want to get the whole row
print(df_actual)

输出:

   1st  2nd  3rd  4th  5th  6th  7th  8th  9th  10th  6th Largest
0    0    1    2    3    4    5    6    7    8     9            5
1    1    2    3    4    5    6    7    8    9    10            6