正则表达式选择跳过几行的文本

时间:2017-05-18 05:58:11

标签: python html regex

我需要帮助选择HTML代码的价格。由于我已经提取了电影的标题,我现在需要提取价格。我已经尝试使用前瞻性正则表达式,但是当我使用\ n。*时我得到一个错误,因为它说"一个lookbehind里面的量词使它成为非固定宽度" 。我需要文本中的第一个和第二个价格。

正则表达式我试过了:

(?<=Hello<\/a>.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*?(\$)

Hello<\/a>.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*\n.*?(\$)

但是没有用。

文本:

<a class="blue_link" href="http://www.ebgames.com.au/Games/sjbeiub108723">Hello:</a>
    <div class="hi">
        <p>Including <a class="blue_link"> 
<p>Price$<data1>40.00</p>

请帮助并谢谢你们:)

1 个答案:

答案 0 :(得分:0)

您可以将此正则表达式与DOTALL标志一起使用:

import re

r = "The Durrells: Series 2.+\$(\d+\.\d+).+\$(\d+\.\d+)"

text = ''' <a class="blue_link fn url" href="http://www.fishpond.com.au/Movies/Durrells-Series-2-Keeley-Hawes/5014138609450">The Durrells: Series 2</a>
    <div class="by">
        <p>Starring <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Keeley+Hawes">Keeley Hawes</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Milo+Parker">Milo Parker</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Josh+O%27Connor">Josh O'Connor</a>, <a class="blue_link" href="http://www.fishpond.com.au/c/Movies/s/Daisy+Waterstone">Daisy Wat...</a></p>
        <div class="productSearch-metainfo">
DVD (UK), May 2017        </div>
    </div>
</div></td>
                    <td align="right" style="vertical-align:top;"><div class="productSearch-price-container">
<span class="rrp-label">Elsewhere</span>&nbsp;<s>$30.53</s>&nbsp;&nbsp;<span class="productSpecialPrice"><b>$27.46</b></span>&nbsp;<div style="white-space:nowrap;">&nbsp; &nbsp;<span class="you_save">Save 10%</span>&nbsp;</div><span class="free-shipping">with Free Shipping!</span></div>
'''

print(re.findall(r, text, re.DOTALL))

输出:

[('30.53', '27.46')]