如何使用python下载nasa卫星OPeNDAP数据

时间:2016-10-17 14:14:03

标签: python httprequest netcdf opendap

我尝试过请求,pydap,urllib和netcdf4,并在尝试下载以下NASA数据时遇到重定向错误或权限错误:

GLDAS_NOAH025SUBP_3H:GLDAS诺亚地表模型L4 3小时0.25 x 0.25度子集V001(http://disc.sci.gsfc.nasa.gov/uui/datasets/GLDAS_NOAH025SUBP_3H_V001/summary?keywords=Hydrology

我正在尝试下载大约50k文件,这是一个例子,粘贴到谷歌浏览器浏览器时(如果你有正确的用户名和密码):

http://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/OTF/HTTP_services.cgi?FILENAME=%2Fdata%2FGLDAS_V1%2FGLDAS_NOAH025SUBP_3H%2F2016%2F244%2FGLDAS_NOAH025SUBP_3H.A2016244.2100.001.2016256190725.grb&FORMAT=TmV0Q0RGLw&BBOX=-11.95%2C28.86%2C-0.62%2C40.81&LABEL=GLDAS_NOAH025SUBP_3H.A2016244.2100.001.2016286201048.pss.nc&SHORTNAME=GLDAS_NOAH025SUBP_3H&SERVICE=SUBSET_GRIB&VERSION=1.02&LAYERS=AAAB&DATASET_VERSION=001

任何人都有使用python从网上获取OPeNDAP NASA数据的经验吗?如果需要,我很乐意提供更多信息。

以下是发出401错误的请求尝试:

import requests

def httpdownload():
    '''loop through each line in the text file and open url'''
    httpfile = open(pathlist[0]+"NASAdownloadSample.txt", "r")
    for line in httpfile:
        print line 
        outname = line[-134:-122]+".hdf"
        print outname 
        username = ""
        password = "*"
        r = requests.get(line, auth=("username", "password"), stream=True)
        print r.text
        print r.status_code
        with open(pathlist[0]+outname, 'wb') as out:
             out.write(r.content)
        print outname, "finished" # keep track of progress

这是pydap示例,它给出了重定向错误:

import install_cas_client
from pydap.client import open_url

def httpdownload():
    '''loop through each line in the text file and open url'''
    username = ""
    password = ""
    httpfile = open(pathlist[0]+"NASAdownloadSample.txt", "r")
    fileone = httpfile.readline()
    filetot = fileone[:7]+username+":"+password+"@"+fileone[7:]
    print filetot
    dataset = open_url(filetot)

2 个答案:

答案 0 :(得分:4)

我没有找到使用python的解决方案,但鉴于我现在拥有的信息应该是可能的。我使用了wget和.netrc文件以及如下所示的cookie文件(https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Download%20Data%20Files%20from%20HTTP%20Service%20with%20wget):

#!/bin/bash 

cd # path to output files 
touch .netrc
echo "machine urs.earthdata.nasa.gov login <username> password <password>" >> .netrc
chmod 0600 .netrc
touch .urs_cookies
wget --content-disposition --trust-server-names --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies 
-i <path to text file of url list>

希望它可以帮助其他任何使用此服务器的NASA数据的人。

答案 1 :(得分:1)

我意识到为原始海报回答这个问题有点晚了,但是我在尝试做同样的事情时偶然发现了这个问题,所以我将解决方案留在这里。看来NASA服务器以标准库所不希望的方式使用重定向和基本授权。从https://hydro1.gesdisc.eosdis.nasa.gov下载(例如)时,您将重定向到https://urs.earthdata.nasa.gov进行身份验证。该服务器将身份验证令牌设置为cookie,然后将您重定向回下载文件。如果您不能正确处理Cookie,则将陷入无限重定向循环。如果您未正确处理身份验证和重定向,则会收到拒绝访问错误。

要解决此问题,请将HTTPRedirectHandlerHTTPCookieProcessorHTTPPasswordMgrWithDefaultRealm链接在一起,并将其设置为默认打开工具,或者直接使用该打开工具。

from urllib import request

username = "<your username>"
password = "<your password>"
url = "<remote url of file>"
filename = "<local destination of file>"

redirectHandler = request.HTTPRedirectHandler()
cookieProcessor = request.HTTPCookieProcessor()
passwordManager = request.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, "https://urs.earthdata.nasa.gov", username, password)
authHandler = request.HTTPBasicAuthHandler(passwordManager)
opener = request.build_opener(redirectHandler,cookieProcessor,authHandler)
request.install_opener(opener)
request.urlretrieve(url,filename)