使用正则表达式删除首字母缩略词,基于括号后的大写字符

时间:2021-05-04 12:13:43

标签: python regex string uppercase re

如何删除以下内容:

  • 首字母缩写词以左括号开头,后跟大写或 数量:例如'(ABC' 或 '(ABC)' 或 '(ABC-2A)' 或 '(ABC-1)'。

但是NOT括号之间的单词以大写开头后跟小写,例如'(Bobby)' 或 '(Bob to the beach..)' --> 这是我正在努力解决的部分。


text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z]*\)?', '', string)
  print(cleaned_acronyms)

#current output:
>> 'went to the beach' #Correct
>>'The girl -2A) is walking' #Not correct
>>'The dog obby) is being walked' #Not correct
>>'They are there' #Correct


#desired & correct output:
>> 'went to the beach'
>>'The girl is walking'
>>'The dog (Bobby) is being walked' #(Bobby) is NOT an acronym (uppercase+lowercase)
>>'They are there'

3 个答案:

答案 0 :(得分:2)

在以下上下文中使用 \([A-Z\-0-9]{2,}\)?

import re

text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z\-0-9]{2,}\)?', '', string)
  print(cleaned_acronyms)

我得到了这些结果:

' went to the beach'
'The girl  is walking'
'The dog (Bobby) is being walked'
'They are there '

答案 1 :(得分:2)

尝试一个负面的前瞻:

\((?![A-Z][a-z])[A-Z\d-]+\)?\s*

查看在线demo

  • \( - 文字开头的括号。
  • (?![A-Z][a-z]) - 断言位置的否定前瞻,后跟大写和小写。
  • [A-Z\d-]+ - 匹配 1+ 个大写字母字符、数字或连字符。
  • \)? - 可选的文字右括号。
  • \s* - 0+ 个空白字符。

一些示例 Python 脚本:

import re
text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\((?![A-Z][a-z])[A-Z\d-]+\)?\s*', '', string)
  print(cleaned_acronyms)

打印:

went to the beach
The girl is walking
The dog (Bobby) is being walked
They are there

答案 2 :(得分:1)

使用模式 \([A-Z0-9\-]+\)

例如:

import re

text = ['ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
ptrn = re.compile(r"\([A-Z0-9\-]+\)")
for i in text:
    print(ptrn.sub("", i))

输出:

ABC went to the beach
The girl  is walking
The dog (Bobby) is being walked
They are there