正则表达式匹配一个组中的两个字符串

时间:2016-04-14 09:29:52

标签: regex python-3.x

我试图匹配文本文件中的两个字符串。我写了一个函数来执行两个字符串的匹配。虽然序列有效,但只打印了一个字符串而忽略了另一个字符串:匹配字符串并使用在线正则表达式测试器进行测试。

NVRAM_info.txt

counter = 0
number = 2
while counter < 1000:
    if isPrime(number):
        counter = counter + 1;
    number = number + 1
print number;

NvRam is available, BlockSize is : 0x00001000
            Max. datasize is : 0x00040000

2 个答案:

答案 0 :(得分:0)

试试这个

NVRAM_INFO = "NVRAM_info.txt"
import re

file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)

g = re.findall(p, test_str)

Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)

输出:

0x00040000
0x00001000

答案 1 :(得分:0)

虽然工作得出的一些答案可以更加 高效 ,如下所示。如果s是要搜索的行,那么

reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'

In [62]: pat = re.compile(reg)

In [64]: blocksize, maxsize = pat.search(s).groups()

In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')

现在,我们知道它有效,让我们看看它是否更有效率。 (与@ Tim007的答案相比)

In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop

In [74]: timeit  re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop

所以它比强2.31倍。使用\d{8}代替\d+可以提高效率,因为更具体,更快。其次,这个版本的问题较少,因为它使用re.DOTALL标志,而是使用\n

  

如果给出了选择,通常最好定义正则表达式模式   这样它可以正常工作而无需额外的标志。 ( Beazly