Question

我正在使用Python lxml客户端，我尝试使用以下代码来解析并获取我想要的元素，但它只返回空：

from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
    print post

HTML代码如下所示：

<div>
  <div>
    ...
     <div id="posts">
         <div>
             <div class="post"> 
                 <a href="">User 1</a>
                 <div class="content"> Content 1</div>
             </div>
             <div class="post"> 
                 <a href="">User 2</a>
                 <div class="content"> Content 2</div>
             </div>
             ...
         </div>
     </div>
   ...

我想迭代每个post，以便访问<a>代码和<div>内容。我想打印：

 User 1
 Content 1

 User 2
 Content 2

 ...

Answer 1

使用类似语法的类post定位标记可能更容易：

posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
    print post.find('a').text
    print post.find('div').text # add .strip() to clean the leading space

输出：

User 1
 Content 1

User 2
 Content 2

如何找到具有特定id名称的div并使用lxml迭代其子项？

1 个答案: