从标签中获取背景图像

时间:2018-04-01 09:35:14

标签: python web-scraping data-retrieval

我想从Flickr下载图片,到目前为止我做到了这一点:

import bs4, requests, re

img = requests.get('https://www.flickr.com/search/?text=nature')
img.raise_for_status()

soup = bs4.BeautifulSoup(img.text)

elem = soup.select('main div div div div')
elem[10]
<div class="view photo-list-photo-view requiredToShowOnServer awake" data-view-signature="photo-list-photo-view__UA_1__engagementModelName_photo-lite-models__excludePeople_true__id_8598154512__interactionViewName_photo-list-photo-interaction-view__isMobile_false__isOwner_false__layoutItem_1__measureAFT_true__model_1__modelParams_1__openAdvanced_false__parentContainer_1__parentSignature_photolist-47t__requiredToShowOnClient_true__requiredToShowOnServer_true__rowHeightMod_1__searchSimilar_true__searchSimilarWithTerm_false__searchTerm_nature__searchType_1__showAdvanced_true__showInteractionBarPlaceholder_false__showSort_true__showTools_true__sortMenuItems_1__unifiedSubviewParams_1__viewType_jst" style="transform: translate(277px, 191px); -webkit-transform: translate(277px, 191px); -ms-transform: translate(277px, 191px); width: 364px; height: 205px; background-image: url(//c1.staticflickr.com/9/8232/8598154512_a4e080002d.jpg)"> <div class="interaction-view"></div>

你能帮我从elem中的样式标签中获取背景图片吗?

2 个答案:

答案 0 :(得分:1)

要提取属性,您可以使用elem[10].attrs

然后拆分字符串或使用正则表达式来提取背景。

import bs4, requests, re

img = requests.get('https://www.flickr.com/search/?text=nature')
img.raise_for_status()

soup = bs4.BeautifulSoup(img.text)

elem = soup.select('main div div div div')
print('https://'+elem[10].attrs['style'].split('background-image:')[-1][7:-1])

答案 1 :(得分:1)

如下所示:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.flickr.com/search/?text=nature')
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select(".photo-list-photo-view"):
    image= "https:" + items['style'].split("url(")[1].split(")")[0]
    print(image)

部分输出:

https://c1.staticflickr.com/3/2905/13955816048_7b31caa76f_n.jpg
https://c1.staticflickr.com/6/5603/15317528089_b124ffd236.jpg
https://c1.staticflickr.com/9/8298/7861351302_a9fef5f3b0_m.jpg
https://c1.staticflickr.com/8/7093/7293219226_6e36693123_m.jpg