让我们说我有一个文本文件。我应该阅读它,它会像:
... Department of Something is called (DoS) and then more texts and more text...
然后“while”我正在阅读文本文件,我找到了一个首字母缩略词,这里是
DoS
因此,为了找到我写的首字母缩略词:
import re
import numpy
# open the file?
test_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."
regex = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'
found= re.findall(regex, test_string)
print found
,输出为:
['DoS']
我想做的是:
在'Dos'之前和之后找到2次(这里是2x3 = 6)个单词。这将是:
3.1 pre= Department of Something is called
3.2 acronym= DoS
3.3 post= and then more texts and more
任何帮助都将受到赞赏,因为我是python的新手。
答案 0 :(得分:1)
不确定这是否是最佳解决方案,但也许它足以帮助您。
11.2.0.1,ORA1,ORACLE
11.2.0.4,ORA2,ORACLE
11.2.0.3,ORA3,ORACLE
12.2.0.1,ORA4,ORACLE
12.2.0.2,ORA5,ORACLE
12.2.0.2,ORA6,ORACLE
12.2.0.2,ORA7,ORACLE
5.1,MYS1,MYSQL
5.1,MYS2,MYSQL
会给你:
import re
import numpy
# open the file?
test_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."
regex_acronym = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'
ra = re.compile(regex_acronym)
for m in ra.finditer(test_string):
print m.start(), m.group(), m.span()
n = len(m.group()) * 2
regex_pre_post = r"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,%d})(" % n
regex_pre_post += regex_acronym
regex_pre_post += ")((?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,%d})" % n
found= re.findall(regex_pre_post, test_string)
print found
found = found[0] # For a single match, just do this.
pre = found[0]
acro = found[1]
post = found[2]
print pre, acro, post