我想提取网站中文章的日期和摘要,这是我的代码
from bs4 import BeautifulSoup
from selenium import webdriver
full_url = 'https://www.wsj.com/articles/readers-favorite-summer-recipes-11599238648?mod=searchresults&page=1&pos=20'
url0 = full_url
browser0 = webdriver.Chrome('C:/Users/liuzh/Downloads/chromedriver_win32/chromedriver')
browser0.get(url0)
html0 = browser0.page_source
page_soup = BeautifulSoup(html0, 'html5lib')
date = page_soup.find_all("time", class_="timestamp article__timestamp flexbox__flex--1")
sub_head = page_soup.find_all("h2", class_="sub-head")
print(date)
print(sub_head)
我得到以下结果,如何获得标准表格?(例如,美国东部时间2020年9月4日12:57 pm;我们这个劳动节周末,...)
[<time class="timestamp article__timestamp flexbox__flex--1">
Sept. 4, 2020 12:57 pm ET
</time>]
[<h2 class="sub-head" itemprop="description">This Labor Day weekend, we’re savoring the last of summer with a collection of seasonal recipes shared by Wall Street Journal readers. Each one comes with a story about what this food means to a family and why they return to it each year.</h2>]
谢谢。
答案 0 :(得分:0)
尝试类似的东西:
for d in date:
print(d.text.strip())
给出示例html,输出应为:
Sept. 4, 2020 12:57 pm ET