我想从下面显示的HTML'a'标签中提取“气候8/17/2019 2:00 PM”。 我编写了代码,以为将从'a'标记中提取所有文本,然后再使用字符串操作提取所需的子字符串。
<div class="topic">
<a class="class_a" href="/href_1" data1="" data2="hello" data3="Hi" date="Monday, August 17" time="2:00 PM" topic="climate 8/17/2019 2:00 PM">
<span>2:00 PM</span>
<i class="Afternoon"></i>
</a>
</div>
我运行下面的代码,结果是:
2:00 PM
我还更改了如下所示的行,但没有帮助。
bar = topics.find('a')
至
bar = topics.find('a', {"class": "class_a"})
我检查了bar变量的类型为bs4.element.Tag
类(不是字符串)
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://tbd.com')
bs = BeautifulSoup(html.read(), 'html.parser')
topics = bs.findAll("div", {"class": "topic"})
for topic in topics:
bar = topic.find('a')
print (bar.text)
答案 0 :(得分:3)
如果您已经知道要从中提取文本的元素的类,则可以像任何python dict一样从其属性中获取值:
from bs4 import BeautifulSoup
h = """<div class="topic">
<a class="class_a" href="/href_1" data1="" data2="hello" data3="Hi" date="Monday, August 17" time="2:00 PM" topic="climate 8/17/2019 2:00 PM">
<span>2:00 PM</span>
<i class="Afternoon"></i>
</a>
</div>"""
soup = BeautifulSoup(h, "lxml")
obj = soup.find('a', class_ = "class_a")
print(obj.get('topic'))
#climate 8/17/2019 2:00 PM
答案 1 :(得分:1)
您要提取topic
属性的值,因此应从字典中将其作为键进行访问:
print(bar['topic'])
答案 2 :(得分:1)
您应该获取属性主题的值,而不是如下所示的锚文本:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://tbd.com')
bs = BeautifulSoup(html.read(), 'html.parser')
topics = bs.findAll("div", {"class": "topic"})
for topic in topics:
bar = topic.find('a')
print (bar.get('topic'))
答案 3 :(得分:1)
我认为您的主要问题是,您在循环内指定了“主题”(复数),但想要“主题”(单数)。
# python3 bs_test.py
from urllib.request import urlopen
from bs4 import BeautifulSoup
# html = urlopen('https://tbd.com')
html = """
<div class="topic">
<a class="class_a" href="/href_1" data1="" data2="hello" data3="Hi" date="Monday, August 17" time="2:00 PM" topic="climate 8/17/2019 2:00 PM">
<span>2:00 PM</span>
<i class="Afternoon"></i>
</a>
</div>
"""
# bs = BeautifulSoup(html.read(), 'html.parser')
bs = BeautifulSoup(html, 'html.parser')
topics = bs.findAll("div", {"class": "topic"})
for topic in topics:
bar = topic.find('a')
print (bar['topic'])