Python html div类

时间:2016-08-28 15:45:27

标签: python html beautifulsoup

我试图编写一个简单的程序,将表格的值保存在矩阵中(以后我想将矩阵发送到数据库)。

这是我的代码:

pfad = "https://business.facebook.com/ads/manager/account/ads/?act=516059741896803&pid=p2&report_spec=6056690557117&business_id=401807279988717"
html = urlopen(pfad)
r=requests.get(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll("div", { "class" : "ellipsis_1ha3" })

# no output:
for div in mydivs: 
    if (div["class"]=="ellipsis_1ha3"):
        print div
# output: []
print(mydivs)

我希望div s中的值与ellipsis _1ha3类一样,但我不知道为什么它不起作用。任何人都可以帮助我吗?

这是一个与原始

类似的html示例
<!DOCTYPE html>
<html>
<head>
    <style>

        .ellipsis_1ha3 
        {
            width: 100px;
            border: 1px solid black;
        }
        .a      
        {
            width: 100px;
            border: 1px solid black;
        }

    </style>
</head>

<body>
<div>
    <div style="display: inline-flex;">
        <div class="a">Purchase</div>
        <div class="a">Clicks</div>
    </div>
    </br>
    <div style="display: inline-flex;">
        <div class="ellipsis_1ha3">20</div>
        <div class="ellipsis_1ha3">30</div>
    </div>
    </br>
    <div style="display: inline-flex;">
        <div class="ellipsis_1ha3">10</div>
        <div class="ellipsis_1ha3">50</div>
    </div>
</div>


</body>
</html>

第二个例子

pfad = "http://www.bundesliga.de/de/liga/tabelle/"
html = urlopen(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll('div', { 'class' : 'wwe-cursor-pointer' })
for div in mydivs: 
    if ("wwe-cursor-pointer" in div["class"]):
        print div

1 个答案:

答案 0 :(得分:0)

尝试使用lxml和xpath表达式来提取相关信息。我相信,Beautifulsoup是建立在lxml上的。假设您将文档加载到名为html_string的字符串中。

from lxml import html

h = html.fromstring(html_string)

h.xpath('//div[@class="ellipsis_1ha3"]/node()')
#output:
['20', '30', '10', '50']