如何通过beautifulsoup获取图像src数据?

时间:2016-08-28 17:48:40

标签: python beautifulsoup

我想从此链接获取所有"Where id =2"部电影的图像src数据: - Fandango.com

这是代码: -

coming soon

当我打印poster_link数组时,它会给我def poster(genre): poster_link = [] request = requests.get(http://www.fandango.com/moviescomingsoon?GenreFilter=genre) content = request.content soup = BeautifulSoup(content, "html.parser") soup2 = soup.find('div', {'class':'movie-ls-group'}) elements = soup2.find_all('img') for element in elements: poster_link.append(element.get('src')) return poster_link 而不是图片来源。

2 个答案:

答案 0 :(得分:1)

试试这个。它会快速切割子集并抓取所有具有适当类别的图像。

def poster(genre):
    poster_link = []
    request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=%s' %genre)
    content = request.content
    soup = BeautifulSoup(content, "html.parser")
    imgs = soup.find_all('img', {'class': 'visual-thumb'})

    for img in imgs:
        poster_link.append(img.get('data-src'))
    return poster_link

答案 1 :(得分:1)

詹姆斯的答案很棒但是我注意到它比那个特定部分的图像抓得更多 - 它抓住了新的+即将推出的'页面底部的部分,似乎超出了该类型的范围,并出现在其他页面上。此代码将图像抓取限制为即将推出的类型特定部分。

def poster(genre):
    poster_link = []
    request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=' + genre)
    content = request.content
    soup = BeautifulSoup(content, "html.parser")
    comingsoon = soup.find_all('div', {'class':'movie-ls-group'})
    movies = comingsoon[0].find_all('img', {'class':'visual-thumb'})
    for movie in movies:
        poster_link.append(movie.get('data-src'))
    return poster_link

print (poster('Horror'))

您可能还想过滤掉' emptysource.jpg'返回之前,poster_link数组中的图像,因为它们看起来像没有海报图像的电影的空占位符。