我正在尝试编写一个程序,向网站发送网址请求,然后生成天气雷达动画。然后,我抓取该页面以获取图像URL(它们存储在Java模块中)并将它们下载到本地文件夹。我在许多雷达站和两个雷达产品上迭代地这样做。到目前为止,我已经编写了发送请求的代码,解析了html,并列出了图片网址。我似乎无法做的是在本地重命名和保存图像。除此之外,我想尽可能简化这一点 - 这可能不是我迄今为止所做的。任何帮助1)让图像下载到本地文件夹和2)指向一个更pythonic方式这样做将是伟大的。
# import modules
import urllib2
import re
from bs4 import BeautifulSoup
##test variables##
stationName = "KBYX"
prod = ("bref1","vel1") # a tupel of both ref and vel
bkgr = "black"
duration = "1"
#home_dir = "/path/to/home/directory/folderForImages"
##program##
# This program needs to do the following:
# read the folder structure from home directory to get radar names
#left off here
list_of_folders = os.listdir(home_dir)
for each_folder in list_of_folders:
if each_folder.startswith('k'):
print each_folder
# here each folder that starts with a "k" represents a radar station, and within each folder are two other folders bref1 and vel1, the two products. I want the program to read the folders to decide which radar to retrieve the data for... so if I decide to add radars, all I have to do is add the folders to the directory tree.
# first request will be for prod[0] - base reflectivity
# second request will be for prod[1] - base velocity
# sample path:
# http://weather.rap.ucar.edu/radar/displayRad.php?icao=KMPX&prod=bref1&bkgr=black&duration=1
#base part of the path
base = "http://weather.rap.ucar.edu/radar/displayRad.php?"
#additional parameters
call = base+"icao="+stationName+"&prod="+prod[0]+"&bkgr="+bkgr+"&duration="+duration
#read in the webpage
urlContent = urllib2.urlopen(call).read()
webpage=urllib2.urlopen(call)
#parse the webpage with BeautifulSoup
soup = BeautifulSoup(urlContent)
#print (soup.prettify()) # if you want to take a look at the parsed structure
tag = soup.param.param.param.param.param.param.param #find the tag that holds all the filenames (which are nested in the PARAM tag, and
# located in the "value" parameter for PARAM name="filename")
files_in=str(tag['value'])
files = files_in.split(',') # they're in a single element, so split them by comma
directory = home_dir+"/"+stationName+"/"+prod[1]+"/"
counter = 0
for file in files: # now we should THEORETICALLY be able to iterate over them to download them... here I just print them
print file
答案 0 :(得分:0)
我使用这三种方法从互联网上下载:
from os import path, mkdir
from urllib import urlretrieve
def checkPath(destPath):
# Add final backslash if missing
if destPath != None and len(destPath) and destPath[-1] != '/':
destPath += '/'
if destPath != '' and not path.exists(destPath):
mkdir(destPath)
return destPath
def saveResource(data, fileName, destPath=''):
'''Saves data to file in binary write mode'''
destPath = checkPath(destPath)
with open(destPath + fileName, 'wb') as fOut:
fOut.write(data)
def downloadResource(url, fileName=None, destPath=''):
'''Saves the content at url in folder destPath as fileName'''
# Default filename
if fileName == None:
fileName = path.basename(url)
destPath = checkPath(destPath)
try:
urlretrieve(url, destPath + fileName)
except Exception as inst:
print 'Error retrieving', url
print type(inst) # the exception instance
print inst.args # arguments stored in .args
print inst
有许多示例here可以从各个网站下载图片
答案 1 :(得分:0)
在本地保存图像,例如
import os
IMAGES_OUTDIR = '/path/to/image/output/directory'
for file_url in files:
image_content = urllib2.urlopen(file_url).read()
image_outfile = os.path.join(IMAGES_OUTDIR, os.path.basename(file_url))
with open(image_outfile, 'wb') as wfh:
wfh.write(image_content)
如果要重命名它们,请使用您想要的名称而不是os.path.basename(file_url)。