如果我连续看到A& T,
如果我连续看到G& C,
原始数据框
df=pd.DataFrame([['A','3'],['T','4'],
['A','3'],['A','4'],
['G','3'],['C','4'],
['T','1']
],
columns=['Flag','Value'])
df['found']=False
df['remove']=False
print df
Flag Value found remove
0 A 3 False False
1 T 4 False False
2 A 3 False False
3 A 4 False False
4 G 3 False False
5 C 4 False False
6 T 1 False False
所需数据框
Flag Value found remove
0 A 3 True False
1 T 3 False True
2 A 3 False False
3 A 4 False False
4 G 4 True False
5 C 3 False True
6 T 1 False False
答案 0 :(得分:1)
我会创建一些临时列来跟踪滞后标志和值以及下一个标志。然后你可以直接比较:
df['prior_flag'] = df.Flag.shift()
df['next_flag'] = df.Flag.shift(-1)
df['prior_value'] = df.Value.shift()
# Check for 'A' followed by 'T'
df.loc[(df.Flag == 'A') & (df.next_flag == 'T'), 'found'] = True
df.loc[(df.Flag == 'T') & (df.prior_flag == 'A'), 'remove'] = True
df.loc[(df.Flag == 'T') & (df.prior_flag == 'A'), 'Value'] = \
df.loc[(df.Flag == 'T') & (df.prior_flag == 'A'), 'prior_value']
# Check for 'G' followed by 'C'
df.loc[(df.Flag == 'G') & (df.next_flag == 'C'), 'found'] = True
df.loc[(df.Flag == 'C') & (df.prior_flag == 'G'), 'remove'] = True
temp = df.loc[(df.Flag == 'G') & (df.next_flag == 'C'), 'Value'].values
df.loc[(df.Flag == 'G') & (df.next_flag == 'C'), 'Value'] = \
df.loc[(df.Flag == 'C') & (df.prior_flag == 'G'), 'Value'].values
df.loc[(df.Flag == 'C') & (df.prior_flag == 'G'), 'Value'] = temp
df.drop(['next_flag', 'prior_flag', 'prior_value'], axis=1, inplace=True)
>>> df
Flag Value found remove
0 A 3 True False
1 T 3 False True
2 A 3 False False
3 A 4 False False
4 G 4 True False
5 C 3 False True
6 T 1 False False
因为你想在G后跟C时交换值,所以我创建了一个临时变量temp
来存储中间拷贝值。
然后在最后删除所有临时列。
要查看尚未删除的剩余行:
>>> df[~df.remove]
Flag Value found remove
0 A 3 True False
2 A 3 False False
3 A 4 False False
4 G 4 True False
6 T 1 False False