我正在尝试创建桌面通知程序,为此我正在从网站上抓取新闻。当我运行程序时,我收到以下错误。
news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'
我该如何解决?我对此完全陌生。我试着寻找解决方案,但没有一个能为我工作。
这是我的代码:
import requests
import xml.etree.ElementTree as ET
# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"
def loadRSS():
'''
utility function to load RSS feed
'''
# create HTTP request response object
resp = requests.get(RSS_FEED_URL)
# return response content
return resp.content
def parseXML(rss):
'''
utility function to parse XML format rss feed
'''
# create element tree root object
root = ET.fromstring(rss)
# create empty list for news items
newsitems = []
# iterate news items
for item in root.findall('./channel/item'):
news = {}
# iterate child elements of item
for child in item:
# special checking for namespace object content:media
if child.tag == '{http://search.yahoo.com/mrss/}content':
news['media'] = child.attrib['url']
else:
news[child.tag] = child.encode('utf8')
newsitems.append(news)
# return news items list
return newsitems
def topStories():
'''
main function to generate and return news items
'''
# load rss feed
rss = loadRSS()
# parse XML
newsitems = parseXML(rss)
return newsitems
答案 0 :(得分:2)
您正在尝试将str
转换为bytes
,然后将这些字节存储在字典中。
问题是你正在做的对象是一个
xml.etree.ElementTree.Element
,
不是str
。
您可能打算从该元素内部或周围获取文本,然后encode()
。
文档
建议使用
itertext()
方法:
''.join(child.itertext())
这将评估为str
,然后您可以encode()
。
注意
text
and tail
attributes
可能不包含文字
(重点补充):
它们的值通常是字符串,但可能是任何特定于应用程序的对象。
如果要使用这些属性,则必须处理None
或非字符串值:
head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...
即使这还不够。
如果text
或tail
包含某些意外的bytes
个对象
(或者说错了)
编码,这将引发UnicodeEncodeError
。
我建议将文字保留为str
,而不是对其进行编码。
将文本编码到bytes
对象是将其写入二进制文件,网络套接字或其他硬件之前的最后一步。
有关字节和字符之间差异的更多信息,请参阅Ned Batchelder “Pragmatic Unicode, or, How Do I Stop the Pain?” (36分钟video from PyCon US 2012)。 他涵盖了Python 2和3。
使用child.itertext()
方法和不对字符串进行编码,我从topStories()
获得了一个看起来很合理的词典列表:
[
...,
{'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
'been “a fun ride”; adds success is a lousy teacher while '
'failure is “your friend, philosopher and guide”.',
'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
'title': "I am a hardcore realist, and that's why I feel my journey "
'has been a joyride: Ayushmann...'},
]