所以这是df_wrong:
df_wrong = pd.DataFrame({'stage_name':['Applied', 'Screen call', 'Hometask', '2nd interview',
'Hometask review','Screen call', '2nd interview' ],
'stage_num': [1,2,3,6,4,2,6],
'stage_time_mooving_in': ['2018-08-10 12:00:00', '2018-08-10 13:00:00', '2018-08-10 14:00:00',
'2018-08-10 15:00:00', '2018-08-10 16:00:00', '2018-08-10 17:00:00',
'2018-08-10 19:00:00']})
我想创建一种算法,将正确的表转换为错误的表:
df_right = pd.DataFrame({'stage_name':['Applied', 'Screen call', 'Hometask', 'Hometask review',
'2nd interview' ],
'stage_num': [1,2,3,4,6],
'stage_time_mooving_in': ['2018-08-10 12:00:00', '2018-08-10 13:00:00',
'2018-08-10 14:00:00', '2018-08-10 16:00:00',
'2018-08-10 19:00:00']})
我的问题是如何创建这样的算法。我尝试对df进行排序并删除重复项,但我不知道如何使其适用于所有情况。
所有阶段都可以在此表中列出的管道中进行:
full_pipeline = pd.DataFrame({'stage_name':['Applied', 'Screen call',
'Hometask', 'Hometask review',
'1st interview', '2nd interview',
'Final interview','Offer'],
'stage_num': [1,2,3,4,5,6,7 ,8]})
注意:有一些建议可以帮助创建算法:
答案 0 :(得分:0)
与同事交谈后,提出了以下答案:
def lis(a):
L = []
for (k,v) in enumerate(a):
L.append(max([L[i] for (i,n) in enumerate(a[:k]) if n<v] or [[]], key=len) + [k])
return max(L, key=len)
right_index = lis(list(df_wrong.loc[:,'stage_num']))
df_wrong[df_wrong.index.isin(right_index)]
随时提出自己的解决方案