查找并交换文本中的出现次数

时间:2018-01-31 07:58:56

标签: python regex

我有一个文本文件,输入如下:

update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'

我想要做的是用第二个ab_snus数据替换第一个ab_snus数据,以及交换und_ben数据,以便输出为:

update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''

大多数这些列都可以使用正则表达式'([a-zA-Z\d]+)'收集,但对于空列,即'',我感到很茫然,请使用某种re.finditer(r'\'\'')

'([a-zA-Z\d]+)'也会匹配und_k,这是不合适的。

import re
text = '''
update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'
'''
matchsnus, matchund = [], []
for match in re.finditer(r'\'([a-zA-Z\d]+)\'', text):
    matchsnus.append(match.group(0))
    print(matchsnus)

将返回以下输出: ['2', 'aeiou', '1', 'Bl', 'UB', '4', 'aeiou', '3', 'Bo', 'PC']。一个合理的方法是找到ab_snusund_ben的所有出现,将它们附加到它们各自的数组,然后应用一些逻辑将匹配0与1,2交换为3等等?

TL; DR :如何在ab_snusund_ben?

的每一行中交换数据

3 个答案:

答案 0 :(得分:1)

您可以使用

\b((ab_snus|und_ben)\s*=\s*)('\w*')(.*\b\2\s*=\s*)('\w*')

并替换为\1\5\4\3

请参阅regex demo

<强>详情

  • \b - 字边界
  • ((ab_snus|und_ben)\s*=\s*) - 第1组(从替换模式引用\1反向引用):
    • (ab_snus|und_ben) - 第2组(引用来自替换和正则表达式模式的\2反向引用):ab_snusund_ben
    • \s*=\s* - 包含0 +空格的=
  • ('\w*') - 第3组(引用替换模式的\3反向引用):',零个或多个单词字符(您也可以使用[^']*匹配除'),'
  • 以外的0+字符
  • (.*\b\2\s*=\s*) - 第4组(从替换模式引用\4反向引用):
    • .*\b\2 - 除了换行符之外的任何0+字符,尽可能多,与第2组中捕获的值相同(由于单词边界而匹配为整个单词)
    • \s*=\s* - 包含0 +空格的=
  • ('\w*') - 第5组(引用替换模式中的\5反向引用):',零个或多个单词字符(您也可以使用[^']*匹配除'),'以外的0 +字符。

Python demo

import re
rx = r"\b((ab_snus|und_ben)\s*=\s*)('\w*')(.*\b\2\s*=\s*)('\w*')"
s = ("update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'\n"
    "update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'\n"
    "update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'\n"
    "update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'")
result = re.sub(rx, r"\1\5\4\3", s)
print (result)

结果:

update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''

答案 1 :(得分:1)

re.sub()函数替换两遍:

import re

text = '''
update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'
'''

text = re.sub(r"(update .+\bab_snus = ')([^']*)(' .+\bab_snus = ')([^']*)'", "\\1\\4\\3\\2'", text)
text = re.sub(r"(update .+\bund_ben = ')([^']*)(' .+\bund_ben = ')([^']*)'", "\\1\\4\\3\\2'", text)

print(text)

输出:

update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''

答案 2 :(得分:1)

为什么要在不执行以下操作的情况下执行此任务时使用正则表达式:

with open('current.txt','r') as f:
    for line in f:
        data=line.split()
        data[5],data[13]=data[13],data[5]
        with open('new_file.txt','a') as ff:
            ff.write(" ".join(data)+'\n')

输出:

update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''