我想从此链接获取所有"Where id =2"
部电影的图像src数据: -
Fandango.com
这是代码: -
coming soon
当我打印poster_link数组时,它会给我def poster(genre):
poster_link = []
request = requests.get(http://www.fandango.com/moviescomingsoon?GenreFilter=genre)
content = request.content
soup = BeautifulSoup(content, "html.parser")
soup2 = soup.find('div', {'class':'movie-ls-group'})
elements = soup2.find_all('img')
for element in elements:
poster_link.append(element.get('src'))
return poster_link
而不是图片来源。
答案 0 :(得分:1)
试试这个。它会快速切割子集并抓取所有具有适当类别的图像。
def poster(genre):
poster_link = []
request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=%s' %genre)
content = request.content
soup = BeautifulSoup(content, "html.parser")
imgs = soup.find_all('img', {'class': 'visual-thumb'})
for img in imgs:
poster_link.append(img.get('data-src'))
return poster_link
答案 1 :(得分:1)
詹姆斯的答案很棒但是我注意到它比那个特定部分的图像抓得更多 - 它抓住了新的+即将推出的'页面底部的部分,似乎超出了该类型的范围,并出现在其他页面上。此代码将图像抓取限制为即将推出的类型特定部分。
def poster(genre):
poster_link = []
request = requests.get('http://www.fandango.com/moviescomingsoon?GenreFilter=' + genre)
content = request.content
soup = BeautifulSoup(content, "html.parser")
comingsoon = soup.find_all('div', {'class':'movie-ls-group'})
movies = comingsoon[0].find_all('img', {'class':'visual-thumb'})
for movie in movies:
poster_link.append(movie.get('data-src'))
return poster_link
print (poster('Horror'))
您可能还想过滤掉' emptysource.jpg'返回之前,poster_link
数组中的图像,因为它们看起来像没有海报图像的电影的空占位符。