如何使用python regex对捕获组不以空格结尾但在捕获组之后可能具有空格的子集进行字符串化处理?

时间:2019-04-10 02:06:05

标签: python regex python-3.x python-regex

我正在尝试将一个类似( self, False )的字符串替换为(self, False)。我正在使用的正则表达式:

s = re.compile('\(\s*(.*)\s*\)')
s.sub(r'(\1)', '(    self, False   )')

哪个返回(self, False )

如何捕获括号内的组而没有尾随空格?

3 个答案:

答案 0 :(得分:1)

为什么不使用字符串替换来消除带有空字符的空格

str = '(    self, False   )'
print(str.replace(' ',''))
#(self,False)

答案 1 :(得分:1)

取消选择此项,这将为您提供更简单的解决方案

尝试一下

编辑:已更新,因为您说的是它出现在文本中的事实

编辑2:如果括号中有一个术语,则更新

#TEST 1
>>> import re

>>> str = '(    self, False   )'

>>> re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str)

#OUTPUT
'(self, False)'






#TEST 2
>>> import re


>>> str = '''TEbh eyendd dkdkmfkf(    self, False   ) dduddnudmd (    self, False   )
(    self, False   ) fififfj m(    self, False   )kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo'''


>>> print(re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str))


#OUTPUT
'TEbh eyendd dkdkmfkf(self, False) dduddnudmd (self, False)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo'






#TEST 3
>>> import re


>>> '''TEbh eyendd dkdkmfkf(    self) dduddnudmd (    self)
(    self, False   ) fififfj m(    self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo
(self   ) dndnd (self   ) fufufjiri (    self   ) (self   ) (    self)(    self)(self   )(    self   )(self   )(    self   )'''


>>>  print(re.sub(r'(\()([\s]*?)((?:[\S]+?[\s]*?(?!\))+[\S]*?)|(?:[\S]+?(?=[\s]*?\))))([\s]*?)(\))', r'\1\3\5', str))


#OUTPUT
TEbh eyendd dkdkmfkf(self) dduddnudmd (self)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo
(self) dndnd (self) fufufjiri (self) (self) (self)(self)(self)(self)(self)(self)

Pi带您的简单解决方案

>>> import re


>>> '''TEbh eyendd dkdkmfkf(    self) dduddnudmd (    self)
(    self, False   ) fififfj m(    self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (    self, False   ) fififi,fo
(self   ) dndnd (self   ) fufufjiri (    self   ) (self   ) (    self)(    self)(self   )(    self   )(self   )(    self   )'''


>>> print(re.sub(r'(\()\s*([\S\s]*?)\s*(\))', r'\1\2\3', str))


#OUTPUT
TEbh eyendd dkdkmfkf(self) dduddnudmd (self)
(self, False) fififfj m(self, False)kmiff ikifkifko kfmimfimfifi k
fkmfikfk kfmifm (self, False) fififi,fo
(self) dndnd (self) fufufjiri (self) (self) (self)(self)(self)(self)(self)(self)

答案 2 :(得分:1)

找到了一个简单的解决方案。

s = re.compile('\(\s*(.*?)\s*\)')
s.sub(r'(\1)', 'hi hello ble ble ( self, False   ) ( self      ) (self , greedy    ) (    hello)')
#Output
'hi hello ble ble (self, False) (self) (self , greedy) (hello)'

根据python re文档:

  

”,“ +”和“?”限定词都是贪婪的;它们匹配尽可能多的文本。有时这种行为是不希望的;如果RE <。>与'b'匹配,它将匹配整个字符串,而不仅仅是。加上?限定符使它以非贪婪或最小的方式执行比赛之后;尽可能少的字符将被匹配。使用RE <。*?>仅匹配''。