从经过身份验证的站点获取文件(使用python urllib,urllib2)

时间:2014-07-18 23:15:18

标签: python urllib2 urllib

我正在尝试从网站上获取查询的Excel文件。当我进入直接链接时,它将进入登录页面,一旦我输入了用户名和密码,它将继续自动下载excel文件。我试图避免安装不属于标准python的附加模块(此脚本将在“标准化机器”上运行,如果未安装模块,它将无法工作)

我尝试过以下操作,但我在Excel文件中看到了“页面登录”信息: - |

import urllib

url = "myLink_queriedResult/result.xls"
urllib.urlretrieve(url,"C:\\test.xls")

所以..然后我研究了使用urllib2进行密码验证,但后来我被卡住了。

我有以下代码:

import urllib2
import urllib

theurl = 'myLink_queriedResult/result.xls'
username = 'myName'
password = 'myPassword'

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)

authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
pagehandle.read()  ##but seems like it still only contain a 'login page'   

提前感谢任何建议。 :)

4 个答案:

答案 0 :(得分:1)

对于Requests,这些天通常会回避Urllib。

这可以做你想要的:

import requests
from requests.auth import HTTPBasicAuth

theurl= 'myLink_queriedResult/result.xls'
username = 'myUsername'
password = 'myPassword'

r=requests.get(theurl, auth=HTTPBasicAuth(username, password))

您可以在这里找到更多information on authentication using request.

答案 1 :(得分:1)

您可以尝试使用Python 3,

    import requests
    #import necessary Authentication Method 
    from requests_ntlm import HttpNtlmAuth
    from xlrd import open_workbook
    import pandas as pd
    from io import BytesIO
    r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password'))
    xd = pd.read_excel(BytesIO(r.content))

参考:

  1. https://medium.com/ibm-data-science-experience/excel-files-loading-from-object-storage-python-a54a2cbf4609

  2. http://www.python-requests.org/en/latest/user/authentication/#basic-authentication

  3. Pandas read_csv from url

答案 2 :(得分:0)

您需要使用Cookie来允许身份验证。 `

font-family: initial;

答案 3 :(得分:0)

您可以使用 requests.get 下载文件。试试示例代码:

curl -XPOST "http://localhost:9200/localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty" -H 'Content-Type: application/json' -d'{  "script": {    "source": "ctx._source.price += params.increment_by",    "params": {      "increment_by": 50    }  }}'

享受吧!