我想使用日期作为访问范围,但是当我写article.h3.span时,它给出了第一个范围(/)。如何访问带有日期的跨度?
<a class="category-link" href="https://www.japantimes.co.jp/news_category/world/">
World
</a>
<span>
/
</span>
<a class="category-link" href="https://www.japantimes.co.jp/news_category/crime-legal-world/">
Crime & Legal
</a>
<span class="right date">
Mar 19, 2019
</span>
</h3>
下面是代码:
from bs4 import BeautifulSoup
ssl._create_default_https_context = ssl._create_unverified_context
article = "https://www.japantimes.co.jp/tag/cybersecurity/page/1/"
page = urllib.request.urlopen(article)
soup = BeautifulSoup(page, 'html.parser')
article = soup.find('article')
date = article.h3.span.text
print(date)
答案 0 :(得分:0)
您可以使用next
来获取日期,请参见下面的代码!
html = '''
<a class="category-link" href="https://www.japantimes.co.jp/news_category/world/">
World
</a>
<span>
/
</span>
<a class="category-link" href="https://www.japantimes.co.jp/news_category/crime-legal-world/">
Crime & Legal
</a>
<span class="right date">
Mar 19, 2019
</span>
</h3>'''
soup = BeautifulSoup(html,'html.parser')
date = soup.find('span',attrs={'class':'right date'}).next
print(date.strip())
输出:
Mar 19, 2019
答案 1 :(得分:0)
使用class=right date
标签中的span
可以做到:
from bs4 import BeautifulSoup
article = "https://www.japantimes.co.jp/tag/cybersecurity/page/1/"
page = urllib.request.urlopen(article)
soup = BeautifulSoup(page, 'html.parser')
date = soup.find('span', class_ ="right date")
print(date.text)
输出:
Mar 19, 2019
答案 2 :(得分:0)
在该特定日期,您可以使用更快的单班课程选择器
item = soup.select_one('.date').text
如果全部想要
items = [item.text for item in soup.select('.date')]