链接属性未在BeautifulSoup对象中打印

时间:2019-05-06 14:56:57

标签: python python-3.x beautifulsoup urllib

我正在编写一个程序,该程序将从Google新闻中获得头条新闻。应该在打印标题和文章的链接。但是,它不会打印链接。

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen

news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()

soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
  print(news.title.text)
  print(news.link.text)  
  print("-"*10)

这是输出行的示例

Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS

----------

应该打印标题和链接。但这只是打印标题

2 个答案:

答案 0 :(得分:1)

您应该在代码中修改此行:

soup_page=soup(xml_page,"lxml")

进入:

soup_page=soup(xml_page,"xml")

您将得到结果。

答案 1 :(得分:1)

此html具有奇怪的结构,但是如果您将代码中的for循环更改为此:

for news in news_list:
   link = news.select_one('title')    
   print(link.text)
   print(link.next_sibling.next_sibling)
   print("-"*10)

您应该获得带有链接的标题。