我希望将df中10列中的第六大行值返回到新列,在本例中称为“ 6th_largest”。在整个df中的许多情况下,可能会有不止一行共享第六大价值。不管是一个还是多个,我只需要返回实际的第6个最大值。
这里类似问题中的几个选项没有用,因为它们通常是特定于最大值(我已经能够使用)或仅是第一个和第二个值。
import pandas as pd
#what the actual df might look like
data_actual = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9,10]]
df_actual=pd.DataFrame(data_actual, columns=['1st','2nd','3rd','4th','5th','6th',
'7th','8th','9th','10th'])
#what I want the df to look like after the calculation, returning the 6th largest value.
data_want = [[0, 1, 2, 3, 5, 5, 6, 7, 8, 9, 5], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 5]]
df_want=pd.DataFrame(data_want, columns=['1st','2nd','3rd','4th','5th','6th',
'7th','8th','9th','10th', '6th Largest'])
答案 0 :(得分:2)
使用,排名:
mov DX, offset VETOR_TELA
mov ah,9H
int 21h
输出:
df_actual['6th Largest'] = df_actual.where(df_actual.rank(axis=1) == 6).dropna(axis=1)
答案 1 :(得分:1)
为此,最简单的方法就是对其进行排序和提取:
# np.sort sort increasingly
df['6th Largest'] = np.sort(df.values, axis=1)[:, 4]
答案 2 :(得分:1)
对数组进行分区,因为您只关心nth
元素在正确的位置。如果课程假设您至少具有 n
个元素。
np.partition(df.to_numpy(), -6, axis=1)[:, -6]
array([4, 5], dtype=int64)
时间
In [6]: df = pd.DataFrame(np.random.randint(0, 1000, (1000, 1000))
In [7]: %timeit np.sort(df.values, axis=1)[:, -6]
38.4 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [8]: %timeit np.partition(df.to_numpy(), -6, axis=1)[:, -6]
8.52 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
答案 3 :(得分:0)
您可以在此处使用apply
函数:
data_actual = [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
df_actual=pd.DataFrame(data_actual, columns=['1st','2nd','3rd','4th','5th','6th',
'7th','8th','9th','10th'])
def get_sixth(row):
row = row.tolist()
row.sort()
return row[5]
df_actual["6th Largest"] = df_actual.apply(get_sixth, axis=1) # axis=1 necessary since you want to get the whole row
print(df_actual)
输出:
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 6th Largest
0 0 1 2 3 4 5 6 7 8 9 5
1 1 2 3 4 5 6 7 8 9 10 6