如何使用beautifulsoup获取html中的类内容?

时间:2015-01-13 08:08:20

标签: python html web-scraping beautifulsoup html-parsing

这是我希望工作的html代码:

<section id='price'>

<div class="row">
    <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
    <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
    <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
</div>

我的问题是如何从&#34; class =&#39; col-sm-4&#39;&#34;获得市值,当前价格,账面价值。

如果我尝试,请不要这样做:

print soup.row.col-sm-4.fa.fa-inr

它不起作用。我对python和web scraing有点新意,所以请耐心地走过这个过程。提前谢谢。

2 个答案:

答案 0 :(得分:1)

您可以按文字找到标签,然后获取next_element

from bs4 import BeautifulSoup

data = """
<div class="row">
        <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
        <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
        <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
    </div>
"""
soup = BeautifulSoup(data)

titles = ['Market Cap', 'Current Price', 'Book Value']
for title in titles:
    print soup.find(text=lambda x: x.startswith(title)).next_element.text

打印:

10.64 Crores
35.35
53.52

要获取浮点值,您只需按空格分割并获取第一个元素:

price = soup.find(text=lambda x: x.startswith(title)).strip().split()[0]
print float(price)

你也可以通过CSS Selector

获取它们
for item in soup.select('section#price div.row h4.col-sm-4 b'):
    print item.text

答案 1 :(得分:0)

尝试这样:

>>> for x in soup.find_all("div","row"):
...     print x.text
... 

Market Cap:  10.64 Crores
Current Price:  35.35
Book Value:  53.52