Question

我正在使用lxml来获取标记内的文本，并以此方式

  xpaths_for_questions_lxml = []
    for tag in self.tree.iter():
        try:
            if tag.text and utils.is_question(tag.text.strip()):
                xpaths_for_questions_lxml.append(self.tree.getpath(tag))

        except Exception as e:
            self.logger.debug(traceback.format_exc())
            raise Exception

如果语句带有问号，
is_question模块将返回true

但是，当标记类型为 label 时，tag.text属性为空，即使实际webpage的label标记内有文本，也不会显示任何文本。

为什么标签标签未显示任何文本内容？还是需要做其他任何事情来获取标签标签？

EDIT1：我的问题是，我正在遍历dom树中的所有子项，但是为什么标签内的文本没有显示出来？

Answer 1

如果您想提出问题，可以尝试

r = requests.get('https://www.amctheatres.com/faqs/movie-info')
source = html.fromstring(r.text)
questions = source.xpath('//label[@itemprop="text"]/text()')

或

questions = [label.text_content() for label in source.xpath('//label[@itemprop="text"]')]

请注意，由于label.text_content()节点包含多个子文本节点，因此应使用label.text而不是label

print(questions)
#['Does the runtime shown for each movie include trailers?', 'Where can I find MPAA movie ratings information?', 'What does advertised showtime mean?', 'What movies are playing right now at AMC?', 'What movies are coming soon to AMC?', 'How can I find movie times at AMC?']

如何通过lxml检索标签标记内的文本？

1 个答案: