我正在为我正在处理的电子商务网站提供csv产品文件,以及每个产品(~15K产品)的相应图像的FTP访问权限。
我想使用Python从FTP或HTTP中仅提取csv中列出的图像并将其保存在本地。
import urllib.request
import urllib.parse
import re
url='http://www.fakesite.com/pimages/filename.jpg'
split = urllib.parse.urlsplit(url)
filename = split.path.split("/")[-1]
urllib.request.urlretrieve(url, filename)
print(filename)
saveFile = open(filename,'r')
saveFile.close()
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=",")
images = []
for row in readCSV:
image = row[14]
print(image)
我目前的代码可以从URL中提取文件名并将文件保存为该文件名。它还可以从CSV文件中提取图像的文件名。 (文件名和图像完全相同)我需要它做的是输入文件名,从CSV到URL的末尾,然后将该文件保存为文件名。
我毕业于此:
import urllib.request
import urllib.parse
import re
import os
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=",")
images = []
for row in readCSV:
image = row[14]
images.append(image)
x ='http://www.fakesite.com/pimages/'
url = os.path.join (x,image)
split = urllib.parse.urlsplit(url)
filename = split.path.split("/")[-1]
urllib.request.urlretrieve(url,filename)
saveFile = open(filename,'r')
saveFile.close()
现在这很棒。它完美地运作。它从CSV文件中提取正确的文件名,将其添加到URL的末尾,下载文件,并将其保存为文件名。
但是,我似乎无法弄清楚如何使这个工作用于CSV文件的多行。截至目前,它需要最后一行,并提取相关信息。理想情况下,我会将CSV文件与其上的所有产品一起使用,它将通过并下载每一个产品,而不仅仅是最后一个图像。
答案 0 :(得分:0)
你做的很奇怪......
import urllib.request
import csv
# the images list should be outside the with block
images = []
IMAGE_COLUMN = 14
with open('test.csv') as csvfile:
# read csv
readCSV = csv.reader(csvfile, delimiter=",")
for row in readCSV:
# I guess 14 is the column-index of the image-name like 'image.jpg'
# I've put it in some constant
# now append all the image-names into the list
images.append(row[IMAGE_COLUMN])
# no need for the following
# image = row[14]
# images.append(image)
# make sure, root_url ends with a slash
# x was some strange name for an url
root_url = 'http://www.fakesite.com/pimages/'
# iterate through the list
for image in images:
# you don't need os.path.join, because that's operating system dependent.
# you don't need to urlsplit, because you have created the url yourself.
# you don't need to split the filename as it is the image name
# with the following line, the root_url must end with a slash
url = root_url + image
# urlretrieve saves the file as whatever image is into the current directory
urllib.request.urlretrieve(url, image)
或简而言之,这就是你所需要的:
import urllib.request
import csv
IMAGE_COLUMN = 14
ROOT_URL = 'http://www.fakesite.com/pimages/'
images = []
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=",")
for row in readCSV:
images.append(row[IMAGE_COLUMN])
for image in images:
url = ROOT_URL + image
urllib.request.urlretrieve(url, image)