我试图编写一个简单的程序,将表格的值保存在矩阵中(以后我想将矩阵发送到数据库)。
这是我的代码:
pfad = "https://business.facebook.com/ads/manager/account/ads/?act=516059741896803&pid=p2&report_spec=6056690557117&business_id=401807279988717"
html = urlopen(pfad)
r=requests.get(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll("div", { "class" : "ellipsis_1ha3" })
# no output:
for div in mydivs:
if (div["class"]=="ellipsis_1ha3"):
print div
# output: []
print(mydivs)
我希望div
s中的值与ellipsis _1ha3
类一样,但我不知道为什么它不起作用。任何人都可以帮助我吗?
这是一个与原始
类似的html示例<!DOCTYPE html>
<html>
<head>
<style>
.ellipsis_1ha3
{
width: 100px;
border: 1px solid black;
}
.a
{
width: 100px;
border: 1px solid black;
}
</style>
</head>
<body>
<div>
<div style="display: inline-flex;">
<div class="a">Purchase</div>
<div class="a">Clicks</div>
</div>
</br>
<div style="display: inline-flex;">
<div class="ellipsis_1ha3">20</div>
<div class="ellipsis_1ha3">30</div>
</div>
</br>
<div style="display: inline-flex;">
<div class="ellipsis_1ha3">10</div>
<div class="ellipsis_1ha3">50</div>
</div>
</div>
</body>
</html>
第二个例子
pfad = "http://www.bundesliga.de/de/liga/tabelle/"
html = urlopen(pfad)
soup = BeautifulSoup(html.read(),'html.parser')
mydivs = soup.findAll('div', { 'class' : 'wwe-cursor-pointer' })
for div in mydivs:
if ("wwe-cursor-pointer" in div["class"]):
print div
答案 0 :(得分:0)
尝试使用lxml
和xpath表达式来提取相关信息。我相信,Beautifulsoup是建立在lxml上的。假设您将文档加载到名为html_string
的字符串中。
from lxml import html
h = html.fromstring(html_string)
h.xpath('//div[@class="ellipsis_1ha3"]/node()')
#output:
['20', '30', '10', '50']