如何找到具有特定id名称的div并使用lxml迭代其子项?

时间:2017-09-22 18:01:18

标签: python lxml

我正在使用Python lxml客户端,我尝试使用以下代码来解析并获取我想要的元素,但它只返回空:

from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
    print post

HTML代码如下所示:

<div>
  <div>
    ...
     <div id="posts">
         <div>
             <div class="post"> 
                 <a href="">User 1</a>
                 <div class="content"> Content 1</div>
             </div>
             <div class="post"> 
                 <a href="">User 2</a>
                 <div class="content"> Content 2</div>
             </div>
             ...
         </div>
     </div>
   ...

我想迭代每个post,以便访问<a>代码和<div>内容。我想打印:

 User 1
 Content 1

 User 2
 Content 2

 ...

1 个答案:

答案 0 :(得分:1)

使用类似语法的类post定位标记可能更容易:

posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
    print post.find('a').text
    print post.find('div').text # add .strip() to clean the leading space

输出:

User 1
 Content 1

User 2
 Content 2