使用CSV中的文件名从FTP或HTTP下载文件 - Python 3

时间:2014-10-07 21:51:17

标签: csv python-3.x ftp urllib

我正在为我正在处理的电子商务网站提供csv产品文件,以及每个产品(~15K产品)的相应图像的FTP访问权限。

我想使用Python从FTP或HTTP中仅提取csv中列出的图像并将其保存在本地。

import urllib.request
import urllib.parse
import re

url='http://www.fakesite.com/pimages/filename.jpg'

split = urllib.parse.urlsplit(url)
filename = split.path.split("/")[-1]
urllib.request.urlretrieve(url, filename)

print(filename)

saveFile = open(filename,'r')
saveFile.close()

import csv

with open('test.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=",")

    images = []

    for row in readCSV:
        image = row[14]

print(image)

我目前的代码可以从URL中提取文件名并将文件保存为该文件名。它还可以从CSV文件中提取图像的文件名。 (文件名和图像完全相同)我需要它做的是输入文件名,从CSV到URL的末尾,然后将该文件保存为文件名。

我毕业于此:

import urllib.request
import urllib.parse
import re
import os
import csv

with open('test.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=",")

    images = []

    for row in readCSV:
        image = row[14]

        images.append(image)


x ='http://www.fakesite.com/pimages/'

url = os.path.join (x,image)

split = urllib.parse.urlsplit(url)
filename = split.path.split("/")[-1]
urllib.request.urlretrieve(url,filename)



saveFile = open(filename,'r')
saveFile.close()

现在这很棒。它完美地运作。它从CSV文件中提取正确的文件名,将其添加到URL的末尾,下载文件,并将其保存为文件名。

但是,我似乎无法弄清楚如何使这个工作用于CSV文件的多行。截至目前,它需要最后一行,并提取相关信息。理想情况下,我会将CSV文件与其上的所有产品一起使用,它将通过并下载每一个产品,而不仅仅是最后一个图像。

1 个答案:

答案 0 :(得分:0)

你做的很奇怪......

import urllib.request
import csv

# the images list should be outside the with block
images = []
IMAGE_COLUMN = 14

with open('test.csv') as csvfile:
    # read csv
    readCSV = csv.reader(csvfile, delimiter=",")
    for row in readCSV:
        # I guess 14 is the column-index of the image-name like 'image.jpg'
        # I've put it in some constant  

        # now append all the image-names into the list
        images.append(row[IMAGE_COLUMN])

        # no need for the following
        # image = row[14]
        # images.append(image)

# make sure, root_url ends with a slash
# x was some strange name for an url
root_url = 'http://www.fakesite.com/pimages/'

# iterate through the list
for image in images:
    # you don't need os.path.join, because that's operating system dependent.
    # you don't need to urlsplit, because you have created the url yourself.
    # you don't need to split the filename as it is the image name
    # with the following line, the root_url must end with a slash
    url = root_url + image

    # urlretrieve saves the file as whatever image is into the current directory
    urllib.request.urlretrieve(url, image)

或简而言之,这就是你所需要的:

import urllib.request
import csv

IMAGE_COLUMN = 14
ROOT_URL = 'http://www.fakesite.com/pimages/'
images = []

with open('test.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=",")
    for row in readCSV:
        images.append(row[IMAGE_COLUMN])

for image in images:
    url = ROOT_URL + image
    urllib.request.urlretrieve(url, image)