我正在尝试提取
的文字<im:rating>5</im:rating>
<im:version>1.14</im:version>
来自xml of apple xml,使用BeautifulSoup进行应用商店评论。
我的代码是
def getReview():
url = "https://itunes.apple.com/rss/customerreviews/page=1/id=511376996/sortby=mostrecent/xml?l=en&cc=us"
source = requests.get(url)
text = source.text
soup = BeautifulSoup(text, 'xml')
for l in soup.findAll('entry'):
rate=l.find('rating')
author=(l.find('name')).text
appver=l.find('version')
print(rate)
print(author)
print(appver)
当我使用上面的代码时,我正在获取作者和文本的文字。
<im:rating>5</im:rating>
<im:version>1.14</im:version>
评级&amp;版本,如果我使用appver=l.find('version').text
,那么它会给出错误
appver=l.find('version').text
AttributeError: 'NoneType' object has no attribute 'text'
我想只获得这些评级的价值&amp; version text.i.e for rating&#39; 5&#39; &安培;对于版本&#39; 1.14&#39;。
需要帮助&amp;提前谢谢
答案 0 :(得分:0)
如果您只是想获取这些标签,那么一个简单的pyparsing解析器将会让它们没有BeautifulSoup箍跳过。通过解析给定的标签(pyparsing的标签匹配非常全面),您可以省去解析整个HTML的开销,只需获得您想要的部分,然后将它们放回到您自己设计的简化结构中。请参阅下文,注释和带有3个条目的模拟HTML:
from pyparsing import makeHTMLTags, SkipTo, ungroup
def get_tag_body(start_tag, end_tag):
return ungroup(start_tag.suppress() + SkipTo(end_tag) + end_tag.suppress())
# makeHTMLTags returns a 2-tuple containing expressions for the
# corresponding start tag and end tag
rating_expr = get_tag_body(*makeHTMLTags("im:rating"))("rating")
version_expr = get_tag_body(*makeHTMLTags("im:version"))("version")
# the desired pattern is the rating_expr followed by the version_expr
search_parser = rating_expr + version_expr
# parse the posted sample
sample = """
<im:rating>5</im:rating>
<im:version>1.14</im:version>
"""
# access the named fields using dot notation or dict key notation
results = search_parser.searchString(sample)
if results:
for res in results:
print("rating = {rating}, version = {version}".format_map(res))
打印:
rating = 5, version = 1.14