用来获取缩写词和定义的程序-麻烦获取所有小写字母的缩写

时间:2019-06-04 15:14:09

标签: python regex text text-parsing python-regex

我有一个程序来获取缩写(即查找括号中的单词),然后根据缩写中的字符数,返回那么多单词并对其进行定义。到目前为止,它适用于诸如以大写字母开头的前面单词或大多数前面的单词以大写字母开头的定义。对于后者,它会跳过小写字母(例如“ in”)并转到下一个字母。但是,我的问题是相应单词的数量全部为小写。

当前输出:

  

诸位帅哥(AAD)
  临床试验中方法,测量和疼痛评估的倡议(IMMPACT)
  试用(IMMPACT)。一些患者喜欢常规护理(UC)

所需的输出:

  

诸位帅哥(AAD)
  临床试验中方法,测量和疼痛评估的倡议(IMMPACT)
  常规护理(UC)

import re

s = """Too many people, but not All Awesome Dudes (AAD) only care about the 
Initiative on Methods, Measurement, and Pain Assessment in Clinical 
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of 
doing nothing"""
allabbre = []

for match in re.finditer(r"\((.*?)\)", s):
    start_index = match.start()
    abbr = match.group(1)
    size = len(abbr)
    words = s[:start_index].split()
    count=0
    for k,i in enumerate(words[::-1]):
      if i[0].isupper():count+=1
      if count==size:break
    words=words[-k-1:] 
    definition = " ".join(words)
    abbr_keywords = definition + " " + "(" + abbr + ")"
    pattern='[A-Z]'

    if re.search(pattern, abbr):
      if abbr_keywords not in allabbre:
          allabbre.append(abbr_keywords)
      print(abbr_keywords)

1 个答案:

答案 0 :(得分:1)

该标志用于All are Awesome Dudes (AAD)

等罕见情况
import re

s = """Too many people, but not All Awesome Dudes (AAD) only care about the 
Initiative on Methods, Measurement, and Pain Assessment in Clinical 
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of 
doing nothing
"""
allabbre = []

for match in re.finditer(r"\((.*?)\)", s):
    start_index = match.start()
    abbr = match.group(1)
    size = len(abbr)
    words = s[:start_index].split()
    count=size-1
    flag=words[-1][0].isupper()
    for k,i in enumerate(words[::-1]):
        first_letter=i[0] if flag else i[0].upper()
        if first_letter==abbr[count]:count-=1
        if count==-1:break
    words=words[-k-1:] 
    definition = " ".join(words)
    abbr_keywords = definition + " " + "(" + abbr + ")"
    pattern='[A-Z]'

    if re.search(pattern, abbr):
      if abbr_keywords not in allabbre:
          allabbre.append(abbr_keywords)
      print(abbr_keywords)