这是我的示例HTML代码。
使用HtmlXpathSelector我需要解析html文件。
def解析(自我,回应): edxData = HtmlXpathSelector(响应)
示例html响应数据:
<html>
<body>
<h2 class="title course-title">
<a href="https://www.edx.org/course/mitx/mitx-14-73x-challenges-global-poverty-1350">The Challenges of Global Poverty
</a>
</h2>
<div class="subtitle course-subtitle copy-detail">A course for those who are interested in the challenge posed by massive and persistent world poverty.
</div>
</body>
</html>
答案 0 :(得分:1)
循环内部标记的一种方法可能是:
>>> for h2 in sel.xpath('//h2[@class = "title course-title"]'):
... print h2.xpath('a')
...
[<Selector xpath='a' data=u'<a href="https://www.edx.org/course/mitx'>]
甚至简单地说:
>>> sel.xpath('//h2[@class = "title course-title"]/a')
[<Selector xpath='//h2[@class = "title course-title"]/a' data=u'<a href="https://www.edx.org/course/mitx'>]
找到另一个xpath,只需执行:
>>> sel.xpath('//div[@class="subtitle course-subtitle copy-detail"]')
[<Selector xpath='//div[@class="subtitle course-subtitle copy-detail"]' data=u'<div class="subtitle course-subtitle cop'>]
看起来你正在使用scrapy,请将此问题标记为