我有以下字符串:
background:url('http://images.bloomingdales.com/is/image/BLM/?&$b=BLM/swatches/&layer=0&size=322,23&src=is{$b$1/optimized/8757901_fpx.tif}&cropN=0,0,14,1&anchor=0,0&layer=1&size=23,23&src=is{$b$2/optimized/8757902_fpx.tif}&anchor=0,0&posN=0.071,0&layer=2&size=23,23&src=is{$b$4/optimized/8234544_fpx.tif}&anchor=0,0&posN=0.143,0&layer=3&size=23,23&src=is{$b$7/optimized/1111977_fpx.tif}&anchor=0,0&posN=0.214,0&layer=4&size=23,23&src=is{$b$0/optimized/8538460_fpx.tif}&anchor=0,0&posN=0.286,0&layer=5&size=23,23&src=is{$b$5/optimized/8234545_fpx.tif}&anchor=0,0&posN=0.357,0&layer=6&size=23,23&src=is{$b$3/optimized/1111973_fpx.tif}&anchor=0,0&posN=0.429,0&layer=7&size=23,23&src=is{$b$7/optimized/1252857_fpx.tif}&anchor=0,0&posN=0.5,0&layer=8&size=23,23&src=is{$b$8/optimized/1252858_fpx.tif}&anchor=0,0&posN=0.571,0&layer=9&size=23,23&src=is{$b$7/optimized/8234547_fpx.tif}&anchor=0,0&posN=0.643,0&layer=10&size=23,23&src=is{$b$0/optimized/8757900_fpx.tif}&anchor=0,0&posN=0.714,0&layer=11&size=23,23&src=is{$b$0/optimized/1111970_fpx.tif}&anchor=0,0&posN=0.786,0&layer=12&size=23,23&src=is{$b$1/optimized/1111971_fpx.tif}&anchor=0,0&posN=0.857,0&layer=13&size=23,23&src=is{$b$2/optimized/1111972_fpx.tif}&anchor=0,0&posN=0.929,0&layer=14&op_sharpen=1&fmt=jpeg&qlt=90,0&hei=23')
322px 0 transparent;
我需要得到所有这些部分:
1/optimized/8757901_fpx.tif
,2/optimized/8757902_fpx.tif
等。
我正在使用这个正则表达式:
re.findall(re.compile(r'\d{1,2}/optimized/.+\.tif'), swatch)
返回错误的结果:
['1/optimized/8757901_fpx.tif}&cropN=0,0,14,1&anchor=0,0&layer=1&size=23,23&src=is{$b$2/optimized/8757902_fpx.tif}&anchor=0,0&posN=0.071,0&layer=2&size=23,23&src=is{$b$4/optimized/8234544_fpx.tif}&anchor=0,0&posN=0.143,0&layer=3&size=23,23&src=is{$b$7/optimized/1111977_fpx.tif}&anchor=0,0&posN=0.214,0&layer=4&size=23,23&src=is{$b$0/optimized/8538460_fpx.tif}&anchor=0,0&posN=0.286,0&layer=5&size=23,23&src=is{$b$5/optimized/8234545_fpx.tif}&anchor=0,0&posN=0.357,0&layer=6&size=23,23&src=is{$b$3/optimized/1111973_fpx.tif}&anchor=0,0&posN=0.429,0&layer=7&size=23,23&src=is{$b$7/optimized/1252857_fpx.tif}&anchor=0,0&posN=0.5,0&layer=8&size=23,23&src=is{$b$8/optimized/1252858_fpx.tif}&anchor=0,0&posN=0.571,0&layer=9&size=23,23&src=is{$b$7/optimized/8234547_fpx.tif}&anchor=0,0&posN=0.643,0&layer=10&size=23,23&src=is{$b$0/optimized/8757900_fpx.tif}&anchor=0,0&posN=0.714,0&layer=11&size=23,23&src=is{$b$0/optimized/1111970_fpx.tif}&anchor=0,0&posN=0.786,0&layer=12&size=23,23&src=is{$b$1/optimized/1111971_fpx.tif}&anchor=0,0&posN=0.857,0&layer=13&size=23,23&src=is{$b$2/optimized/1111972_fpx.tif']
我在regex101.com上测试了这个正则表达式并且它运行良好: https://regex101.com/r/tV9kU8/1#
答案 0 :(得分:3)
re.findall(r'\d{1,2}/optimized/.+?\.tif', swatch)
^^
通过向quanitifer
添加?
来使{{1}}非贪婪。
答案 1 :(得分:2)
而不是贪婪.+
,请在ungreedy模式下使用量词:.+?
。
这样,您的正则表达式永远不会匹配/
和.tif
之间的所需字符数超过所需的字符数,即它只匹配.tif
的下一个实例。
答案 2 :(得分:1)
您可以在正则表达式中使用none greedy grouping(请注意,在您的模式中,您需要在?
之后添加+
才能使其none greedy < / em>的):
>>> re.findall(re.compile(r'{\$b\$(.*?)}'), s)
['1/optimized/8757901_fpx.tif', '2/optimized/8757902_fpx.tif',
'4/optimized/8234544_fpx.tif', '7/optimized/1111977_fpx.tif',
'0/optimized/8538460_fpx.tif', '5/optimized/8234545_fpx.tif',
'3/optimized/1111973_fpx.tif', '7/optimized/1252857_fpx.tif',
'8/optimized/1252858_fpx.tif', '7/optimized/8234547_fpx.tif',
'0/optimized/8757900_fpx.tif', '0/optimized/1111970_fpx.tif',
'1/optimized/1111971_fpx.tif', '2/optimized/1111972_fpx.tif']
由于你们所有人的图像路径都在\$b\$
之后,你可以使用以下模式:
{\$b\$(.*?)}
将匹配\$b\$
中{}
之后的任何内容。