我试图匹配文本文件中的两个字符串。我写了一个函数来执行两个字符串的匹配。虽然序列有效,但只打印了一个字符串而忽略了另一个字符串:匹配字符串并使用在线正则表达式测试器进行测试。
NVRAM_info.txt
counter = 0
number = 2
while counter < 1000:
if isPrime(number):
counter = counter + 1;
number = number + 1
print number;
NvRam is available, BlockSize is : 0x00001000
Max. datasize is : 0x00040000
答案 0 :(得分:0)
试试这个
NVRAM_INFO = "NVRAM_info.txt"
import re
file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)
g = re.findall(p, test_str)
Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)
输出:
0x00040000
0x00001000
答案 1 :(得分:0)
虽然工作得出的一些答案可以更加 高效 ,如下所示。如果s
是要搜索的行,那么
reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'
In [62]: pat = re.compile(reg)
In [64]: blocksize, maxsize = pat.search(s).groups()
In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')
现在,我们知道它有效,让我们看看它是否更有效率。 (与@ Tim007的答案相比)
In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop
In [74]: timeit re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop
所以它比强2.31倍。使用\d{8}
代替\d+
可以提高效率,因为更具体,更快。其次,这个版本的问题较少,因为它不使用re.DOTALL
标志,而是使用\n
。
如果给出了选择,通常最好定义正则表达式模式 这样它可以正常工作而无需额外的标志。 ( Beazly )