我有以下元素:
<div class="column4">
Unlimited Subscription<br/> Discount for Monthly <br/> Total Amount
</div>
如何仅使用Beautiful Soup将三个字符串提取为三个不同的元素。不能使用字符串转换和正则表达式:
预期产出:
Unlimited Subscription
Discount for Monthly
Total Amount
答案 0 :(得分:2)
要获取单个字符串,您可以获取<select class="multiselect-success" multiple="multiple" name="categories[]">
<option value="1">laravel</option>
<option value="2">nodejs</option>
<option value="3">php</option>
</select>
元素的children
并按类型过滤它们。
div
或更短,使用>>> bs = bs4.BeautifulSoup(html)
>>> div = bs.find(attrs={"class":"column4"})
>>> [c.strip() for c in div.children if type(c) is bs4.element.NavigableString]
['Unlimited Subscription', 'Discount for Monthly', 'Total Amount']
(如果您不想div.stripped_strings
,则只需div.strings
):
strip
答案 1 :(得分:0)
如果您希望以上面显示的方式获得输出,那么您可以遵守以下内容:
from bs4 import BeautifulSoup
html_elem ="""
<div class="column4">
Unlimited Subscription<br/> Discount for Monthly <br/> Total Amount
</div>
"""
soup = BeautifulSoup(html_elem, 'lxml')
for item in soup.select(".column4"):
for data in item.select("br"):data.replace_with("\n")
print(item.text.strip())
输出:
Unlimited Subscription
Discount for Monthly
Total Amount
答案 2 :(得分:-1)
from bs4 import BeautifulSoup
html_doc = """<div class="column4">
Unlimited Subscription<br/> Discount for Monthly <br/> Total Amount
</div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
soup.find("div").text.strip()