Question

我才刚刚开始涉足Python，正如许多人所做的那样，我从网络抓取示例开始尝试该语言。我已经看到了许多使用zip和map来合并列表的示例，但是在尝试打印该列表时遇到了问题。再说一次，我是新来的，请保持温柔。

该代码从2种特定标签类型（帖子的日期和标题）中收集所有内容，并将它们作为2个列表返回。为此，我正在使用BeautifulSoup和请求。我正在为此测试练习的网站是一个名为“ Staxel”的小游戏的博客

我可以在for循环中使用[soup.find]和[print]使我的代码打印一个标签的完整列表，但是当我尝试添加第二个列表进行打印时，我只是得到一个终止而没有错误。有关如何正确打印2个列表的任何提示？

我正在寻找类似的输出

条目2019-01-06新年

项2018年11月30日适用于1.3.52的Staxel Changelog

# import libraries
import requests
import ssl
from bs4 import BeautifulSoup

# set the URL string
quote_page = 'https://blog.playstaxel.com'

# query the website and return the html to give us a 'page' variable
page = requests.get(quote_page)


# parse the html using beautiful soup and store in a variable ... 'soup'
soup = BeautifulSoup(page.content, 'lxml')

# Remove the 'div' of name and get it's value
title_box = soup.find_all('h1',attrs={'class':'entry-title'})
date_box = soup.find_all('span',attrs={'class':'entry-date published'})
titles = [title.text.strip() for title in title_box]
dates = [date.text.strip()for date in date_box]
date_list = zip(dates, titles)
for heading in date_list:
    print ("Entry {}")

Answer 1

问题在于您查询的日期返回了一个空列表，因此zip操作的结果也将为空。要从该页面提取日期，您要查找类型为time的类型为span而不是entry-date published的标签：

像这样：

date_box = soup.find_all("time", attrs={"class": "entry-date published"})

使用以下代码：

import requests
from bs4 import BeautifulSoup

quote_page = "https://blog.playstaxel.com"
page = requests.get(quote_page)
soup = BeautifulSoup(page.content, "lxml")

title_box = soup.find_all("h1", attrs={"class": "entry-title"})
date_box = soup.find_all("time", attrs={"class": "entry-date published"})
titles = [title.text.strip() for title in title_box]
dates = [date.text.strip() for date in date_box]

for date, title in zip(dates, titles):
    print(f"{date}: {title}")

结果变为：

2019-01-10: Magic update – feature preview
2019-01-06: New Years
2018-11-30: Staxel Changelog for 1.3.52
2018-11-13: Staxel Changelog for 1.3.49
2018-10-21: Staxel Changelog for 1.3.48
2018-10-12: Halloween Update & GOG

Python-将两个单列列表合并为一个双列列表并打印

1 个答案: