我需要帮助选择HTML代码的价格。由于我已经提取了电影的标题,我现在需要提取价格。我已经尝试使用前瞻性正则表达式,但是当我使用\ n。*时我得到一个错误,因为它说"一个lookbehind里面的量词使它成为非固定宽度" 。我需要文本中的第一个和第二个价格。
正则表达式我试过了:
(?<=Hello<\/a>.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*?(\$)
和
Hello<\/a>.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*?(\$)
但是没有用。
文本:
<a class="blue_link" href="http://www.ebgames.com.au/Games/sjbeiub108723">Hello:</a>
<div class="hi">
<p>Including <a class="blue_link">
<p>Price$<data1>40.00</p>
请帮助并谢谢你们:)
答案 0 :(得分:0)
您可以将此正则表达式与DOTALL标志一起使用:
import re
r = "The Durrells: Series 2.+\$(\d+\.\d+).+\$(\d+\.\d+)"
text = ''' <a class="blue_link fn url" href="http://www.fishpond.com.au/Movies/Durrells-Series-2-Keeley-Hawes/5014138609450">The Durrells: Series 2</a>
<div class="by">
<p>Starring <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Keeley+Hawes">Keeley Hawes</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Milo+Parker">Milo Parker</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Josh+O%27Connor">Josh O'Connor</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Daisy+Waterstone">Daisy Wat...</a></p>
<div class="productSearch-metainfo">
DVD (UK), May 2017 </div>
</div>
</div></td>
<td align="right" style="vertical-align:top;"><div class="productSearch-price-container">
<span class="rrp-label">Elsewhere</span> <s>$30.53</s> <span class="productSpecialPrice"><b>$27.46</b></span> <div style="white-space:nowrap;"> <span class="you_save">Save 10%</span> </div><span class="free-shipping">with Free Shipping!</span></div>
'''
print(re.findall(r, text, re.DOTALL))
输出:
[('30.53', '27.46')]