如何返回大于x的最后一个元素的值

时间:2019-12-28 11:55:01

标签: python pandas pandas-groupby

data_dict = {'obj': {0: 'obj1', 1: 'obj1', 2: 'obj1', 3: 'obj1', 4: 'obj1', 5: 'obj1', 6: 'obj1', 7: 'obj1', 8: 'obj2', 9: 'obj2', 10: 'obj2', 11: 'obj2', 12: 'obj2', 13: 'obj2', 14: 'obj2', 15: 'obj2', 16: 'obj3', 17: 'obj3', 18: 'obj3', 19: 'obj3', 20: 'obj3', 21: 'obj3', 22: 'obj3', 23: 'obj3'}, 'seq': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 1, 9: 2, 10: 3, 11: 4, 12: 5, 13: 6, 14: 7, 15: 8, 16: 1, 17: 2, 18: 3, 19: 4, 20: 5, 21: 6, 22: 7, 23: 8}, 'var': {0: 1900.0, 1: 3100.0, 2: 100.0, 3: 7800.0, 4: 1300.0, 5: 100.0, 6: 400.0, 7: 4800.0, 8: 1900.0, 9: 2600.0, 10: 600.0, 11: 7800.0, 12: 1300.0, 13: 100.0, 14: 400.0, 15: 4800.0, 16: 1900.0, 17: 2600.0, 18: 500.0, 19: 7900.0, 20: 1800.0, 21: 4800.0, 22: 300.0, 23: 300.0}, 'expected_output': {0: 1, 1: 2, 2: 2, 3: 4, 4: 5, 5: 5, 6: 5, 7: 8, 8: 1, 9: 2, 10: 2, 11: 4, 12: 5, 13: 5, 14: 5, 15: 8, 16: 1, 17: 2, 18: 2, 19: 4, 20: 5, 21: 6, 22: 6, 23: 6}}

df = pd.DataFrame(data_dict).set_index(['obj', 'seq'])

使用上面介绍的df,我想创建一列,对于每一行,该列将返回由{定义的每个组中的最后一行的seqvar >= 800 {1}}索引级别。我该怎么办?

4 个答案:

答案 0 :(得分:1)

好吧,我想我想通了:

df['new'] = df.groupby('obj', group_keys = False).apply(lambda x: (x['var'] >= 800).cumsum().rank(method = 'min'))

注意:仅适用于seq中的值,从1开始以1进行迭代。 如果不是这种情况,我们就必须进行如下操作:

df['new'] = df.reset_index().groupby('obj', group_keys = False).apply(lambda x: x.loc[(x['var'] >= 800).cumsum().rank(method = 'min').astype(int) - 1, 'seq'])

答案 1 :(得分:1)

直接的解决方案是将np.nanffill()方法一起使用:

df['var2'] = np.where(df['var'] >= 800, df.index.get_level_values('seq'), np.nan)
df['var2'] = df.groupby('obj')['var2'].ffill().astype(int)

答案 2 :(得分:0)

这是我的解决方法,尽管我假设您事先知道num_objects:

df = pd.DataFrame(data_dict).reset_index()
df['var'] = df['var'].astype('int64')
num_objects = 10
for j in range(1, num_objects):
    print (df[(df['var'] >= 800) & (df['obj'] == ('obj' + str(j)))][-1:]['seq'])

答案 3 :(得分:0)

检查条件,并将False映射到NaN,并将True映射到1。然后使用乘法和cummax(由于'seq'是数字单调递增的),我们可以获得满足组中条件的最后一个'seq'。我们在组内转发填充以替换NaN。不幸的是,我们使用了两个groupby,但是这确保了输出列保持NaN直到满足'obj'组中条件的第一行为止。

df['result'] = ((df['var'].ge(800).map({False: np.NaN, True: 1})
                 *df.index.get_level_values('seq'))
                 .groupby('obj').cummax()
                 .groupby('obj').ffill())