如何将bs4.element.ResultSet转换为日期/字符串?

时间:2020-09-08 20:06:51

标签: python web-scraping beautifulsoup

我想提取网站中文章的日期和摘要,这是我的代码

from bs4 import BeautifulSoup
from selenium import webdriver
full_url = 'https://www.wsj.com/articles/readers-favorite-summer-recipes-11599238648?mod=searchresults&page=1&pos=20'
url0 = full_url
browser0 = webdriver.Chrome('C:/Users/liuzh/Downloads/chromedriver_win32/chromedriver')
browser0.get(url0)


html0 = browser0.page_source
page_soup = BeautifulSoup(html0, 'html5lib')
date = page_soup.find_all("time", class_="timestamp article__timestamp flexbox__flex--1")
sub_head = page_soup.find_all("h2", class_="sub-head")
print(date)
print(sub_head)

我得到以下结果,如何获得标准表格?(例如,美国东部时间2020年9月4日12:57 pm;我们这个劳动节周末,...)

[<time class="timestamp article__timestamp flexbox__flex--1">
        Sept. 4, 2020 12:57 pm ET
      </time>]
[<h2 class="sub-head" itemprop="description">This Labor Day weekend, we’re savoring the last of summer with a collection of seasonal recipes shared by Wall Street Journal readers. Each one comes with a story about what this food means to a family and why they return to it each year.</h2>]

谢谢。

1 个答案:

答案 0 :(得分:0)

尝试类似的东西:

for d in date:
   print(d.text.strip())

给出示例html,输出应为:

Sept. 4, 2020 12:57 pm ET