我正在尝试制作一个图像抓取工具,并想知道是否有人可以提供以下示例:
page = requests.get('www.example.com/image1')
tree = html.fromstring(page.text)
pic = tree.xpath(Copied XPath)
print pic[0].attrib['src']
现在在'页面'在这种情况下,我有图片的网址' www.example.com/image1'。我想知道如果我有一个图像名称列表,例如图像2,图像3,图像4等,是否可以循环这个过程。
答案 0 :(得分:1)
是的,有可能:
list_of_image_names = ['image1', 'image2', 'image3']
for image_name in list_of_image_names:
page = requests.get('www.example.com/' + image_name)
tree = html.fromstring(page.text)
pic = tree.xpath(Copied XPath)
print pic[0].attrib['src']
答案 1 :(得分:0)
假设您上面发布的代码功能正常,您可以在某种循环中复制相同的功能。这是一个如何工作的例子。
def picLooper():
pictureList = ['image1','image2', 'image3'] # list of image names
pictureURL = dict() # dictionary to hold URL for images
for picture in range(len(pictureList)):
page = requests.get('www.example.com/' + pictureList[picture])
tree = html.fromstring(page.text)
pic = tree.xpath(Copied XPath)
pictureURL[image] = pic
值得注意的是,此实现假设您了解要捕获的图像名称。希望这有助于作为一个起点! :d