我想从网页中提取所有(在本例中为两个)hast标签。
$(document).ready(function () {
$('#addEOYPayment').click(function () {
$.ajax({
type: "GET",
url: "AmountOwed",
datatype: "Json",
success: function (data) {
$('#TblEOYPayment_AmountOwed').html(data.responseText);
}
});
});
});
但是我只对一个分支(在此示例中为包装器)中的哈希标签感兴趣:“#hash1 with space”和“#hash2withoutsace”。现在,我的代码如下:
<html>
<head>
</head>
<body>
<div class="predefinition">
<p class="part1">
<span class="part1-head">Entries:</span>
<a class="pr" href="/go_somewhere/">#hashA with space</a>,
<a class="pr" href="/go_somewhere/">#hashBwithoutsace</a>,
</p>
<span class="part2">Boundaries:</span>
<p>some boundary statement</p>
</div>
<div class="wrapper"> <!– I only want to search here–>
<p class="part1">
<span class="part1-head">Entries:</span>
<a class="pr" href="/go_somewhere/">#hash1 with space</a>, <!– I only want to find this–>
<a class="pr" href="/go_somewhere/">#hash2withoutsace</a>, <!– and this–>
</p>
<span class="part2">Boundaries:</span>
<p>some other boundary statement</p>
</div>
</body>
</html>
答案 0 :(得分:1)
您可以使用a
class
查找所有pr
标签的文本,然后选择最后两个:
from bs4 import BeautifulSoup as soup
results = [i.text for i in soup(content, 'html.parser').find('div', {'class':'wrapper'}).find_all('a', {'class':'pr'})]
输出:
['#hash1 with space', '#hash2withoutsace']