用美丽的汤解析。在span标签外查找字符串(数字)

时间:2017-05-05 07:10:00

标签: python-3.x beautifulsoup html-parsing

我设法使用beautifulsoup解析以下数据:

<span class="price-currency">$</span>200.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>1,000.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>1,300.00</span>, <span class="j-original-price">
<span class="price-currency">$</span>550.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">
<span class="price-currency">$</span>450.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">
<span class="price-currency">$</span>50.00 <span class="price-type price-type--negotiable">Negotiable</span></span>, <span class="j-original-price">

现在我需要解析每行中间的数字。 我以为它会使用nextSibling但是却失败了。 我还注意到一些数字后面是紧密的span标签,一些数字后跟开放的span标签。

如何用beautifulsoup解析这些数字? 这就是我获得上述数据的方式:

span = soup("span", { "class" : "price-currency" })

由于

2 个答案:

答案 0 :(得分:0)

尝试循环浏览您的数据并从.price-currency

中提取soup代码
[s.extract() for s in soup("span", {"class":"price-currency"})]

然后从以下位置获取所需的货币值:

list_price = soup("span", {"class":"j-original-price"})
print [pr.text for pr in list_price]

答案 1 :(得分:0)

如果数据与您提供的数据完全相同,那么让.next_sibling对我有用:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <span class="price-currency">$</span>200.00</span>, <span class="j-original-price">
   ...: <span class="price-currency">$</span>1,000.00</span>, <span class="j-original-price">
   ...: <span class="price-currency">$</span>1,300.00</span>, <span class="j-original-price">
   ...: <span class="price-currency">$</span>550.00 <span class="price-type price-type--negotiable">N
   ...: egotiable</span></span>, <span class="j-original-price">
   ...: <span class="price-currency">$</span>450.00 <span class="price-type price-type--negotiable">N
   ...: egotiable</span></span>, <span class="j-original-price">
   ...: <span class="price-currency">$</span>50.00 <span class="price-type price-type--negotiable">Ne
   ...: gotiable</span></span>, <span class="j-original-price">
   ...: """

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: for item in soup("span", {"class": "price-currency"}):
   ...:     print(item.next_sibling)
   ...:     
200.00
1,000.00
1,300.00
550.00 
450.00 
50.00