Question

我试图匹配文本文件中的两个字符串。我写了一个函数来执行两个字符串的匹配。虽然序列有效，但只打印了一个字符串而忽略了另一个字符串：匹配字符串并使用在线正则表达式测试器进行测试。

NVRAM_info.txt

counter = 0
number = 2
while counter < 1000:
    if isPrime(number):
        counter = counter + 1;
    number = number + 1
print number;

＃

NvRam is available, BlockSize is : 0x00001000
            Max. datasize is : 0x00040000

Answer 1

试试这个

NVRAM_INFO = "NVRAM_info.txt"
import re

file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)

g = re.findall(p, test_str)

Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)

输出：

0x00040000
0x00001000

Answer 2

虽然工作得出的一些答案可以更加高效，如下所示。如果s是要搜索的行，那么

reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'

In [62]: pat = re.compile(reg)

In [64]: blocksize, maxsize = pat.search(s).groups()

In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')

现在，我们知道它有效，让我们看看它是否更有效率。（与@ Tim007的答案相比）

In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop

In [74]: timeit  re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop

所以它比强2.31倍。使用\d{8}代替\d+可以提高效率，因为更具体，更快。其次，这个版本的问题较少，因为它不使用re.DOTALL标志，而是使用\n。

如果给出了选择，通常最好定义正则表达式模式这样它可以正常工作而无需额外的标志。（ Beazly ）

正则表达式匹配一个组中的两个字符串

＃

2 个答案: