我正在使用lxml来获取标记内的文本,并以此方式
xpaths_for_questions_lxml = []
for tag in self.tree.iter():
try:
if tag.text and utils.is_question(tag.text.strip()):
xpaths_for_questions_lxml.append(self.tree.getpath(tag))
except Exception as e:
self.logger.debug(traceback.format_exc())
raise Exception
如果语句带有问号,is_question模块将返回true
但是,当标记类型为 label 时,tag.text属性为空,即使实际webpage的label标记内有文本,也不会显示任何文本。
为什么标签标签未显示任何文本内容?还是需要做其他任何事情来获取标签标签?
EDIT1:我的问题是,我正在遍历dom树中的所有子项,但是为什么标签内的文本没有显示出来?
答案 0 :(得分:1)
如果您想提出问题,可以尝试
r = requests.get('https://www.amctheatres.com/faqs/movie-info')
source = html.fromstring(r.text)
questions = source.xpath('//label[@itemprop="text"]/text()')
或
questions = [label.text_content() for label in source.xpath('//label[@itemprop="text"]')]
请注意,由于label.text_content()
节点包含多个子文本节点,因此应使用label.text
而不是label
print(questions)
#['Does the runtime shown for each movie include trailers?', 'Where can I find MPAA movie ratings information?', 'What does advertised showtime mean?', 'What movies are playing right now at AMC?', 'What movies are coming soon to AMC?', 'How can I find movie times at AMC?']