我一直在使用lxml通过xpath从页面中提取数据。到现在为止还挺好。但我有一个新的挑战:
我必须在包含DIV中提取div的所有ID,并将这些ID名称传递到列表中。我猜我可以使用Beautiful Soup来做这个(或者也可能是lxml)我只是不确定如何去做:
例如,在这里我将不得不提取“beacon”和“lentil”:
<div id="live-events">
<div class ="events" id="beacon">
....other things...
</div>
<div class="events" id ="lentil">
....other things...
</div>
</div>
连连呢?
谢谢!
答案 0 :(得分:0)
这非常简单:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""
... <div id="live-events">
...
... <div class ="events" id="beacon">
... ....other things...
... </div>
...
... <div class="events" id ="lentil">
... ....other things...
... </div>
...
... </div>
... """)
>>> live_events = soup.find(id="live-events")
>>> ids = [div["id"] for div in live_events.find_all("div")]
>>> ids
[u'beacon', u'lentil']