麻烦我的图像刮板

时间:2014-07-11 14:29:58

标签: python loops python-2.7 lxml

我正在尝试制作一个图像抓取工具,并想知道是否有人可以提供以下示例:

page = requests.get('www.example.com/image1')
tree = html.fromstring(page.text)

pic = tree.xpath(Copied XPath)

print pic[0].attrib['src']

现在在'页面'在这种情况下,我有图片的网址' www.example.com/image1'。我想知道如果我有一个图像名称列表,例如图像2,图像3,图像4等,是否可以循环这个过程。

2 个答案:

答案 0 :(得分:1)

是的,有可能:

list_of_image_names = ['image1', 'image2', 'image3']

for image_name in list_of_image_names:
    page = requests.get('www.example.com/' + image_name)
    tree = html.fromstring(page.text)

    pic = tree.xpath(Copied XPath)

    print pic[0].attrib['src']

答案 1 :(得分:0)

假设您上面发布的代码功能正常,您可以在某种循环中复制相同的功能。这是一个如何工作的例子。

def picLooper():
    pictureList = ['image1','image2', 'image3'] # list of image names
    pictureURL = dict() # dictionary to hold URL for images
    for picture in range(len(pictureList)):
        page = requests.get('www.example.com/' + pictureList[picture])
        tree = html.fromstring(page.text)

        pic = tree.xpath(Copied XPath)
        pictureURL[image] = pic

值得注意的是,此实现假设您了解要捕获的图像名称。希望这有助于作为一个起点! :d