根据括起来的字母数检测括号内的缩写检索完整的定义

时间:2019-06-02 02:45:23

标签: python regex text text-parsing abbreviation

我遇到一种情况,我需要根据括号中包含的字母数来检索首字母缩写词的定义。对于我正在处理的数据,括号中的字母数与要检索的单词数相对应。我知道这不是获取缩写的可靠方法,但就我而言,确实如此。例如:

String ='尽管家庭健康史(FHH)通常被认为是常见的慢性疾病的重要危险因素,但护士(NP)很少考虑到它。'

期望的产出:家庭健康史(FHH),执业护士(NP)

我知道如何从字符串中提取括号,但是之后我被卡住了。任何帮助表示赞赏。

 import re

 a = 'Although family health history (FHH) is commonly accepted as an 
 important risk factor for common, chronic diseases, it is rarely considered 
 by a nurse practitioner (NP).'

 x2 = re.findall('(\(.*?\))', a)

 for x in x2:
    length = len(x)
    print(x, length) 

5 个答案:

答案 0 :(得分:4)

使用正则表达式匹配查找匹配开始的位置。然后使用python字符串索引来获取直到比赛开始的子字符串。按单词拆分子字符串,并获取最后n个单词。其中n是缩写的长度。

import re
s = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'


for match in re.finditer(r"\((.*?)\)", s):
    start_index = match.start()
    abbr = match.group(1)
    size = len(abbr)
    words = s[:start_index].split()[-size:]
    definition = " ".join(words)

    print(abbr, definition)

此打印:

FHH family health history
NP nurse practitioner

答案 1 :(得分:2)

recursive patternPyPI regex module一起使用的想法。

\b[A-Za-z]+\s+(?R)?\(?[A-Z](?=[A-Z]*\))\)?

See this pcre demo at regex101

  • \b[A-Za-z]+\s+word boundaryone or more字母,一个或多个空白
  • 相匹配
  • (?R)?递归部分:optionally从头开始粘贴图案
  • \(?需要使括号成为可选,以便递归适合\)?
  • [A-Z](?=[A-Z]*\)匹配一个高位字母if followed by,以)与其之间的任意A-Z匹配
  1. 不检查第一个单词字母是否与缩写词中的字母实际匹配。
  2. 不检查缩写前面的左括号。要进行检查,请在后面添加一个可变长度。将[A-Z](?=[A-Z]*\))更改为(?<=\([A-Z]*)[A-Z](?=[A-Z]*\))

答案 2 :(得分:1)

这能解决您的问题吗?

data = [12,b'c', 100009, b"string", 3.45]

stringformat of data = "icl6sd"

packed data =b'\x0c\x00\x00\x00c\x00\x00\x00\xa9\x86\x01
\x00\x00\x00\x00\x00string\x00\x00\x9a\x99\x99\x99\x99\x99\x0b@'

实际上,基廷根击败了我

答案 3 :(得分:0)

relist-comprehension一起使用

x_lst = [ str(len(i[1:-1])) for i in re.findall('(\(.*?\))', a) ]

[re.search( r'(\S+\s+){' + i + '}\(.{' + i + '}\)', a).group(0) for i in x_lst]
#['family health history (FHH)', 'nurse practitioner (NP)']

答案 4 :(得分:0)

这种解决方案并不是特别聪明,它简单地搜索首字母缩写词,然后建立一种模式来提取每个单词之前的单词:

import re

string = "Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP)."

definitions = []

for acronym in re.findall(r'\(([A-Z]+?)\)', string):
    length = len(acronym)

    match = re.search(r'(?:\w+\W+){' + str(length) + r'}\(' + acronym + r'\)', string)

    definitions.append(match.group(0))

print(", ".join(definitions))

输出

> python3 test.py
family health history (FHH), nurse practitioner (NP)
>