我正试图抓住article标签下的第一个链接。到目前为止我有这个
def getByName(name: String) = {
select(_.id, _.name_list)
.where(_.name_list.contains(name))
.allowFiltering()
.one()
}
抓取文章标记
下的两个链接http://images.media-allrecipes.com/userphotos/250x250/00/17/17/171761.jpg'">
for link in soup.find("section", {"id": "grid"}).findAll("a", href=re.compile("/recipe/[0-9]*/.*/")):
if 'href' in link.attrs:
print(link.attrs['href'])
<a href="/recipe/17066/janets-rich-banana-bread/" data-internal-referrer-link='hub recipe' data-click-id='cardslot 2' >
<img class="grid-col__rec-image" data-lazy-load data-original-src="http://images.media-allrecipes.com/userphotos/250x250/00/17/17/171761.jpg" alt="Janet's Rich Banana Bread Recipe and Video - Sour cream guarantees a moist and tender loaf. And bananas are sliced instead of mashed in this recipe, giving a concentrated banana taste in every bite." title="Janet's Rich Banana Bread Recipe and Video" src="http://images.media-allrecipes.com/ar/spacer.gif" style="display: inline;" />
<h3 class="grid-col__h3 grid-col__h3--recipe-grid">
Janet's Rich Banana Bread
<div class="grid-col__video">
<a href="/video/1027/janets-rich-banana-bread/" data-internal-referrer-link='hub recipe' data-click-id='cardslot 2'><span class="icon--videoplay-small-white"></span></a>
</div>
</h3>
</a>
<a href="/recipe/17066/janets-rich-banana-bread/" data-internal-referrer-link='hub recipe' data-click-id='cardslot 2'>
<div class="grid-col__ratings">
<div class="rating-stars" data-scroll-to-anchor="reviews" data-ratingstars= 4.82000017166138 >
<img height="16" width="16" src="http://images.media-allrecipes.com/ar-images/icons/rating-stars/full-star-2015.svg" />
<img height="16" width="16" src="http://images.media-allrecipes.com/ar-images/icons/rating-stars/full-star-2015.svg" />
<img height="16" width="16" src="http://images.media-allrecipes.com/ar-images/icons/rating-stars/full-star-2015.svg" />
<img height="16" width="16" src="http://images.media-allrecipes.com/ar-images/icons/rating-stars/full-star-2015.svg" />
<img height="16" width="16" src="http://images.media-allrecipes.com/ar-images/icons/rating-stars/full-star-2015.svg" />
你可以看到那里有两个链接,我试图只获得第一个链接。任何帮助将不胜感激!
答案 0 :(得分:0)
'find'函数总是返回一个元素,而'findAll'返回所有元素(在本例中为所有链接)。 或者你可以在findAll:
中使用limit参数first_link=soup.findAll("a", limit=1)
或
first_link=soup.find("a")
参考: https://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#Searching%20the%20Parse%20Tree