Question

我想提取网站中文章的日期和摘要，这是我的代码

from bs4 import BeautifulSoup
from selenium import webdriver
full_url = 'https://www.wsj.com/articles/readers-favorite-summer-recipes-11599238648?mod=searchresults&page=1&pos=20'
url0 = full_url
browser0 = webdriver.Chrome('C:/Users/liuzh/Downloads/chromedriver_win32/chromedriver')
browser0.get(url0)


html0 = browser0.page_source
page_soup = BeautifulSoup(html0, 'html5lib')
date = page_soup.find_all("time", class_="timestamp article__timestamp flexbox__flex--1")
sub_head = page_soup.find_all("h2", class_="sub-head")
print(date)
print(sub_head)

我得到以下结果，如何获得标准表格？（例如，美国东部时间2020年9月4日12:57 pm；我们这个劳动节周末，...）

[<time class="timestamp article__timestamp flexbox__flex--1">
        Sept. 4, 2020 12:57 pm ET
      </time>]
[<h2 class="sub-head" itemprop="description">This Labor Day weekend, we’re savoring the last of summer with a collection of seasonal recipes shared by Wall Street Journal readers. Each one comes with a story about what this food means to a family and why they return to it each year.</h2>]

谢谢。

Answer 1

尝试类似的东西：

for d in date:
   print(d.text.strip())

给出示例html，输出应为：

Sept. 4, 2020 12:57 pm ET

如何将bs4.element.ResultSet转换为日期/字符串？

1 个答案: