以下是html内容的片段:
index % n != 0
我正在尝试提取标题和h3中的链接。 我正在做的是:
<div class="post-inner wow bounceInUp animated" data-wow-offset='80' data-wow-delay="0s" data-wow-duration="0.8s">
<a href="https://url.com/hello/" class="post-link"></a>
<div class="post-pic lazyload" data-bg="https://url.com/wp-content/uploads/2019/01/opioid-300x200.jpg" *style="background-image: url('');" * /></div>
<div class="tags-wrapper">
<a href="/tag/hello-world">Hello World</a>
<a href="/tag/noob">Noob</a>
</div>
<h3>
<a href="https://url.com/hello/">
My First Title-Hello</a>
</h3>
</div>
我无法在此处通过 h3 标签。在进行故障排除时,如果我删除了
>>> from lxml.html import fromstring
>>> content = """
<div class="post-inner wow bounceInUp animated" data-wow-offset='80' data-wow-delay="0s" data-wow-duration="0.8s">
... <a href="https://url.com/hello/" class="post-link"></a>
... <div class="post-pic lazyload" data-bg="https://url.com/wp-content/uploads/2019/01/opioid-300x200.jpg" *style="background-image: url('');" * /></div
>
... <div class="tags-wrapper">
... <a href="/tag/hello-world">Hello World</a>
... <a href="/tag/noob">Noob</a>
... </div>
... <h3>
... <a href="https://url.com/hello/">
... My First Title-Hello</a>
... </h3>
... </div>"""
>>> html_response = fromstring(content)
>>> main_tag = html_response.xpath('//div[@class="post-inner wow bounceInUp animated"]')
>>> main_tag
[<Element div at 0x106b347e0>]
>>> main_tag[0].xpath('div')
[<Element div at 0x106b34788>]
>>> main_tag[0].xpath('a')
[<Element a at 0x106b34838>]
>>> main_tag[0].xpath('a/@href')
['https://url.com/hello/']
>>> main_tag[0].xpath('h3/a')
[]
>>> main_tag[0].xpath('h3')
[]
>>>
我能够提取标签。
有人可以帮我吗?
答案 0 :(得分:2)
您捕获的div
在第三行的结尾处关闭(请注意,该行的第一个div
以/>
结尾)。因此,您要捕获的h3
元素不在该div
中。