Question

让我们说我有一个文本文件。我应该阅读它，它会像：

 ... Department of Something is called (DoS) and then more texts and more text...

然后“while”我正在阅读文本文件，我找到了一个首字母缩略词，这里是

DoS

因此，为了找到我写的首字母缩略词：

import re
import numpy

# open the file? 
test_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."
regex = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'

found= re.findall(regex, test_string)
print found

，输出为：

['DoS']

我想做的是：

我正在阅读文件并查找和缩写（这是DoS），
计算我找到的字符数（这里是Dos的3个字符）

在'Dos'之前和之后找到2次（这里是2x3 = 6）个单词。这将是：

3.1 pre=     Department of Something is called
3.2 acronym= DoS
3.3 post=    and then more texts and more

将这些3（pre，acronym，post）放在一个数组中。

任何帮助都将受到赞赏，因为我是python的新手。

Answer 1

不确定这是否是最佳解决方案，但也许它足以帮助您。

11.2.0.1,ORA1,ORACLE
11.2.0.4,ORA2,ORACLE
11.2.0.3,ORA3,ORACLE
12.2.0.1,ORA4,ORACLE
12.2.0.2,ORA5,ORACLE
12.2.0.2,ORA6,ORACLE
12.2.0.2,ORA7,ORACLE
5.1,MYS1,MYSQL
5.1,MYS2,MYSQL

会给你：

import re
import numpy

# open the file? 
test_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."
regex_acronym = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'

ra = re.compile(regex_acronym)
for m in ra.finditer(test_string):
    print m.start(), m.group(), m.span()
    n = len(m.group()) * 2
    regex_pre_post = r"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,%d})(" % n
    regex_pre_post += regex_acronym 
    regex_pre_post += ")((?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,%d})" % n
    found= re.findall(regex_pre_post, test_string)
    print found

    found = found[0] # For a single match, just do this.
    pre = found[0]
    acro = found[1]
    post = found[2]
    print pre, acro, post

Python在某个单词之前和之后找到n个单词

1 个答案: