data_dict = {'obj': {0: 'obj1', 1: 'obj1', 2: 'obj1', 3: 'obj1', 4: 'obj1', 5: 'obj1', 6: 'obj1', 7: 'obj1', 8: 'obj2', 9: 'obj2', 10: 'obj2', 11: 'obj2', 12: 'obj2', 13: 'obj2', 14: 'obj2', 15: 'obj2', 16: 'obj3', 17: 'obj3', 18: 'obj3', 19: 'obj3', 20: 'obj3', 21: 'obj3', 22: 'obj3', 23: 'obj3'}, 'seq': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 1, 9: 2, 10: 3, 11: 4, 12: 5, 13: 6, 14: 7, 15: 8, 16: 1, 17: 2, 18: 3, 19: 4, 20: 5, 21: 6, 22: 7, 23: 8}, 'var': {0: 1900.0, 1: 3100.0, 2: 100.0, 3: 7800.0, 4: 1300.0, 5: 100.0, 6: 400.0, 7: 4800.0, 8: 1900.0, 9: 2600.0, 10: 600.0, 11: 7800.0, 12: 1300.0, 13: 100.0, 14: 400.0, 15: 4800.0, 16: 1900.0, 17: 2600.0, 18: 500.0, 19: 7900.0, 20: 1800.0, 21: 4800.0, 22: 300.0, 23: 300.0}, 'expected_output': {0: 1, 1: 2, 2: 2, 3: 4, 4: 5, 5: 5, 6: 5, 7: 8, 8: 1, 9: 2, 10: 2, 11: 4, 12: 5, 13: 5, 14: 5, 15: 8, 16: 1, 17: 2, 18: 2, 19: 4, 20: 5, 21: 6, 22: 6, 23: 6}}
df = pd.DataFrame(data_dict).set_index(['obj', 'seq'])
使用上面介绍的df
,我想创建一列,对于每一行,该列将返回由{定义的每个组中的最后一行的seq
值var >= 800
{1}}索引级别。我该怎么办?
答案 0 :(得分:1)
好吧,我想我想通了:
df['new'] = df.groupby('obj', group_keys = False).apply(lambda x: (x['var'] >= 800).cumsum().rank(method = 'min'))
注意:仅适用于seq
中的值,从1开始以1进行迭代。
如果不是这种情况,我们就必须进行如下操作:
df['new'] = df.reset_index().groupby('obj', group_keys = False).apply(lambda x: x.loc[(x['var'] >= 800).cumsum().rank(method = 'min').astype(int) - 1, 'seq'])
答案 1 :(得分:1)
直接的解决方案是将np.nan
与ffill()
方法一起使用:
df['var2'] = np.where(df['var'] >= 800, df.index.get_level_values('seq'), np.nan)
df['var2'] = df.groupby('obj')['var2'].ffill().astype(int)
答案 2 :(得分:0)
这是我的解决方法,尽管我假设您事先知道num_objects:
df = pd.DataFrame(data_dict).reset_index()
df['var'] = df['var'].astype('int64')
num_objects = 10
for j in range(1, num_objects):
print (df[(df['var'] >= 800) & (df['obj'] == ('obj' + str(j)))][-1:]['seq'])
答案 3 :(得分:0)
检查条件,并将False
映射到NaN
,并将True
映射到1
。然后使用乘法和cummax
(由于'seq'
是数字单调递增的),我们可以获得满足组中条件的最后一个'seq'
。我们在组内转发填充以替换NaN
。不幸的是,我们使用了两个groupby,但是这确保了输出列保持NaN
直到满足'obj'
组中条件的第一行为止。
df['result'] = ((df['var'].ge(800).map({False: np.NaN, True: 1})
*df.index.get_level_values('seq'))
.groupby('obj').cummax()
.groupby('obj').ffill())