Question

我认为这是一个相当简单的问题。

我从gdata中检索了一个文件，这个文件是：https://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments

我试图挑出

之间的文本

"< author >HERE< /author >"

标签所以我将留下仅包含用户名的输出。 python甚至是解决这个问题的最佳方式，还是应该使用其他语言？我从早上8点（4小时）开始谷歌搜索，我还没有找到任何看似简单的任务。

祝你好运， - 米奇鲍威尔

Answer 1

你有原子进给，所以我使用feedparser来处理：

import feedparser

result = feedparser.parse('https://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments')
for entry in result.entries:
    print entry.author

打印：

FreebieFM
micromicros
FreebieFM
Sarah Grimstone
FreebieFM
# etc.

Feedparser是一个外部库，但很容易安装。如果您只需要使用标准库，则可以使用ElementTree API，但要解析Atom提要，您需要在解析器中包含HTML实体，并且您必须处理命名空间（而不是{{1强烈要点）：

ElementTree

from urllib2 import urlopen from xml.etree import ElementTree response = urlopen('https://gdata.youtube.com/feeds/api/videos/Ej4_G-E1cAM/comments') tree = ElementTree.parse(response) nsmap = {'a': 'http://www.w3.org/2005/Atom'} for author in tree.findall('.//a:author/a:name', namespaces=nsmap): print author.text字典允许nsmap将ElementTree前缀翻译为这些元素的正确命名空间。

在xml文档中单独输出标签？

1 个答案: