我正在编写一个程序,该程序将从Google新闻中获得头条新闻。应该在打印标题和文章的链接。但是,它不会打印链接。
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()
soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
print(news.title.text)
print(news.link.text)
print("-"*10)
这是输出行的示例
Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS
----------
应该打印标题和链接。但这只是打印标题
答案 0 :(得分:1)
您应该在代码中修改此行:
soup_page=soup(xml_page,"lxml")
进入:
soup_page=soup(xml_page,"xml")
您将得到结果。
答案 1 :(得分:1)
此html具有奇怪的结构,但是如果您将代码中的for
循环更改为此:
for news in news_list:
link = news.select_one('title')
print(link.text)
print(link.next_sibling.next_sibling)
print("-"*10)
您应该获得带有链接的标题。