我有一个文本文件,输入如下:
update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'
我想要做的是用第二个ab_snus
数据替换第一个ab_snus
数据,以及交换und_ben
数据,以便输出为:
update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''
大多数这些列都可以使用正则表达式'([a-zA-Z\d]+)'
收集,但对于空列,即''
,我感到很茫然,请使用某种re.finditer(r'\'\'')
。
'([a-zA-Z\d]+)'
也会匹配und_k
,这是不合适的。
import re
text = '''
update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'
'''
matchsnus, matchund = [], []
for match in re.finditer(r'\'([a-zA-Z\d]+)\'', text):
matchsnus.append(match.group(0))
print(matchsnus)
将返回以下输出:
['2', 'aeiou', '1', 'Bl', 'UB', '4', 'aeiou', '3', 'Bo', 'PC']
。一个合理的方法是找到ab_snus
和und_ben
的所有出现,将它们附加到它们各自的数组,然后应用一些逻辑将匹配0与1,2交换为3等等?
TL; DR :如何在ab_snus
和und_ben?
答案 0 :(得分:1)
您可以使用
\b((ab_snus|und_ben)\s*=\s*)('\w*')(.*\b\2\s*=\s*)('\w*')
并替换为\1\5\4\3
。
请参阅regex demo。
<强>详情
\b
- 字边界((ab_snus|und_ben)\s*=\s*)
- 第1组(从替换模式引用\1
反向引用):
(ab_snus|und_ben)
- 第2组(引用来自替换和正则表达式模式的\2
反向引用):ab_snus
或und_ben
\s*=\s*
- 包含0 +空格的=
('\w*')
- 第3组(引用替换模式的\3
反向引用):'
,零个或多个单词字符(您也可以使用[^']*
匹配除'
),'
(.*\b\2\s*=\s*)
- 第4组(从替换模式引用\4
反向引用):
.*\b\2
- 除了换行符之外的任何0+字符,尽可能多,与第2组中捕获的值相同(由于单词边界而匹配为整个单词)\s*=\s*
- 包含0 +空格的=
('\w*')
- 第5组(引用替换模式中的\5
反向引用):'
,零个或多个单词字符(您也可以使用[^']*
匹配除'
),'
以外的0 +字符。import re
rx = r"\b((ab_snus|und_ben)\s*=\s*)('\w*')(.*\b\2\s*=\s*)('\w*')"
s = ("update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'\n"
"update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'\n"
"update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'\n"
"update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'")
result = re.sub(rx, r"\1\5\4\3", s)
print (result)
结果:
update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''
答案 1 :(得分:1)
用re.sub()
函数替换两遍:
import re
text = '''
update dbo.pc set ab_snus = '1' where ab_pb = 'aeiou' and ab_snus = '2'
update dbo.und set und_ben = '' where und_k = 'UB' AND und_ben = 'Bl'
update dbo.pc set ab_snus = '3' where ab_pb = 'aeiou' and ab_snus = '4'
update dbo.und set und_ben = '' where und_k = 'PC' AND und_ben = 'Bo'
'''
text = re.sub(r"(update .+\bab_snus = ')([^']*)(' .+\bab_snus = ')([^']*)'", "\\1\\4\\3\\2'", text)
text = re.sub(r"(update .+\bund_ben = ')([^']*)(' .+\bund_ben = ')([^']*)'", "\\1\\4\\3\\2'", text)
print(text)
输出:
update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''
答案 2 :(得分:1)
为什么要在不执行以下操作的情况下执行此任务时使用正则表达式:
with open('current.txt','r') as f:
for line in f:
data=line.split()
data[5],data[13]=data[13],data[5]
with open('new_file.txt','a') as ff:
ff.write(" ".join(data)+'\n')
输出:
update dbo.pc set ab_snus = '2' where ab_pb = 'aeiou' and ab_snus = '1'
update dbo.und set und_ben = 'Bl' where und_k = 'UB' AND und_ben = ''
update dbo.pc set ab_snus = '4' where ab_pb = 'aeiou' and ab_snus = '3'
update dbo.und set und_ben = 'Bo' where und_k = 'PC' AND und_ben = ''