Python Regex Scrape

时间:2015-11-20 11:24:18

标签: python regex findall

我有一段代码来自产品的价格(分期付款的价格和报价),我试图用python来获取价格(649)。

<span style="color: #404040; font-size: 12px;"> from </span>
<span class="money-int">649</span>
<sup class="money-decimal">99</sup>
<span class="money-currency">$</span>
<br />
<span style="color: #404040; font-size: 12px;">from 
    <b>
        <span class="money-int">37</span>
        <sup class="money-decimal">35</sup>
        <span class="money-currency">$</span>/month
    </b>
</span>

我尝试使用re.findall这样的

match = re.findall('\"money-int\"\>(\d*)\<\/span\>\<sup class=\"money-decimal\"\>(\d*)',content)

问题是我得到两个价格的列表,649和37,我只需要649。

2 个答案:

答案 0 :(得分:0)

re.findall(r"<span[^>]*class=\"money-int\"[^>]*>([^<]*)</span>[^<]*<sup[^>]*class=\"money-decimal\"[^>]*>([^<]*)</sup>", YOUR_STRING)

答案 1 :(得分:0)

考虑使用xml解析器来完成这项工作,以避免未来的麻烦:

#!/usr/bin/python

from bs4 import BeautifulSoup as BS

html = '''
<span style="color: #404040; font-size: 12px;"> from </span>
<span class="money-int">649</span>
<sup class="money-decimal">99</sup>
<span class="money-currency">$</span>
<br />
<span style="color: #404040; font-size: 12px;">from
    <b>
        <span class="money-int">37</span>
        <sup class="money-decimal">35</sup>
        <span class="money-currency">$</span>/month
    </b>
</span>
'''

soup = BS(html, 'lxml')

print soup.find_all("span", attrs={"class": "money-int"})[0].get_text()

ideone

上的在线演示