Xpath问题从具有样式属性的父标记中解析子标记

时间:2019-02-04 07:32:34

标签: python xpath lxml

以下是html内容的片段:

index % n != 0

我正在尝试提取标题和h3中的链接。 我正在做的是:

<div class="post-inner wow bounceInUp animated" data-wow-offset='80' data-wow-delay="0s" data-wow-duration="0.8s">
   <a href="https://url.com/hello/" class="post-link"></a>
   <div class="post-pic lazyload" data-bg="https://url.com/wp-content/uploads/2019/01/opioid-300x200.jpg" *style="background-image: url('');" * /></div>
   <div class="tags-wrapper">
      <a href="/tag/hello-world">Hello World</a>
      <a href="/tag/noob">Noob</a>
   </div>
   <h3>
      <a href="https://url.com/hello/">
      My First Title-Hello</a>
   </h3>
</div>

我无法在此处通过 h3 标签。在进行故障排除时,如果我删除了 >>> from lxml.html import fromstring >>> content = """ <div class="post-inner wow bounceInUp animated" data-wow-offset='80' data-wow-delay="0s" data-wow-duration="0.8s"> ... <a href="https://url.com/hello/" class="post-link"></a> ... <div class="post-pic lazyload" data-bg="https://url.com/wp-content/uploads/2019/01/opioid-300x200.jpg" *style="background-image: url('');" * /></div > ... <div class="tags-wrapper"> ... <a href="/tag/hello-world">Hello World</a> ... <a href="/tag/noob">Noob</a> ... </div> ... <h3> ... <a href="https://url.com/hello/"> ... My First Title-Hello</a> ... </h3> ... </div>""" >>> html_response = fromstring(content) >>> main_tag = html_response.xpath('//div[@class="post-inner wow bounceInUp animated"]') >>> main_tag [<Element div at 0x106b347e0>] >>> main_tag[0].xpath('div') [<Element div at 0x106b34788>] >>> main_tag[0].xpath('a') [<Element a at 0x106b34838>] >>> main_tag[0].xpath('a/@href') ['https://url.com/hello/'] >>> main_tag[0].xpath('h3/a') [] >>> main_tag[0].xpath('h3') [] >>>

我能够提取标签。

有人可以帮我吗?

1 个答案:

答案 0 :(得分:2)

您捕获的div在第三行的结尾处关闭(请注意,该行的第一个div/>结尾)。因此,您要捕获的h3元素不在该div中。