我有一个df来跟踪问题的状态。从“打开”,“进行中”到“关闭”,如下所示:
T1 T2 T3 T4 T5
1 Open In Progress Closed
2 In Progress Closed
3 Open In Progress Open Closed
4 Open In Progress Closed Open Closed
5 Open In Progress Closed
基本上,我想查找所有重新打开的问题。可以通过具有Closed
值然后进行后续转换的任何行来说明这一点。例如,索引4
在T3
中有一个封闭的值,但随后T4
包含一些要重新打开的值。
输出将是:
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed 0
2 In Progress Closed 0
3 Open In Progress Open Closed 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed 0
在实际df中,列的范围从T1到T25,并且有5万行。
因此,基本上我需要检查每个列,如果关闭则查找,然后检查下一个列以查看是否不为空。
谢谢
答案 0 :(得分:4)
我认为需要:
df['Reopened'] = ((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1).astype(int)
print (df)
T1 T2 T3 T4 T5 Reopened
1 Open In Progress Closed NaN NaN 0
2 In Progress Closed NaN NaN NaN 0
3 Open In Progress Open Closed NaN 0
4 Open In Progress Closed Open Closed 1
5 Open In Progress Closed NaN NaN 0
详细信息:
检查每个Open
的{{1}}值:
df
使用已更改的DataFrame检查print ((df == 'Open'))
T1 T2 T3 T4 T5
1 True False False False False
2 False False False False False
3 True False True False False
4 True False False True False
5 True False False False False
:
Closed
然后通过print (df.shift(axis=1))
T1 T2 T3 T4 T5
1 NaN Open In Progress Closed NaN
2 NaN In Progress Closed NaN NaN
3 NaN Open In Progress Open Closed
4 NaN Open In Progress Closed Open
5 NaN Open In Progress Closed NaN
print ((df.shift(axis=1)) == 'Closed')
T1 T2 T3 T4 T5
1 False False False True False
2 False False True False False
3 False False False False True
4 False False False True False
5 False False False True False
链接到&
,并通过any
每行至少获得一个AND
:
True
最后通过print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')))
T1 T2 T3 T4 T5
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False True False
5 False False False False False
print (((df == 'Open') & ((df.shift(axis=1)) == 'Closed')).any(axis=1))
1 False
2 False
3 False
4 True
5 False
dtype: bool
将布尔型掩码转换为整数并分配给新列:
astype