我正在使用Python lxml客户端,我尝试使用以下代码来解析并获取我想要的元素,但它只返回空:
from lxml import html
tree = html.fromstring(html_content)
posts = tree.xpath('//*[@id="posts"]/div')
for post in posts:
print post
HTML代码如下所示:
<div>
<div>
...
<div id="posts">
<div>
<div class="post">
<a href="">User 1</a>
<div class="content"> Content 1</div>
</div>
<div class="post">
<a href="">User 2</a>
<div class="content"> Content 2</div>
</div>
...
</div>
</div>
...
我想迭代每个post
,以便访问<a>
代码和<div>
内容。我想打印:
User 1
Content 1
User 2
Content 2
...
答案 0 :(得分:1)
使用类似语法的类post
定位标记可能更容易:
posts = tree.xpath('//*[@id="posts"]/div/*[@class="post"]')
for post in posts:
print post.find('a').text
print post.find('div').text # add .strip() to clean the leading space
输出:
User 1
Content 1
User 2
Content 2