我正在尝试从网站上获取查询的Excel文件。当我进入直接链接时,它将进入登录页面,一旦我输入了用户名和密码,它将继续自动下载excel文件。我试图避免安装不属于标准python的附加模块(此脚本将在“标准化机器”上运行,如果未安装模块,它将无法工作)
我尝试过以下操作,但我在Excel文件中看到了“页面登录”信息: - |
import urllib
url = "myLink_queriedResult/result.xls"
urllib.urlretrieve(url,"C:\\test.xls")
所以..然后我研究了使用urllib2进行密码验证,但后来我被卡住了。
我有以下代码:
import urllib2
import urllib
theurl = 'myLink_queriedResult/result.xls'
username = 'myName'
password = 'myPassword'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
pagehandle.read() ##but seems like it still only contain a 'login page'
提前感谢任何建议。 :)
答案 0 :(得分:1)
对于Requests,这些天通常会回避Urllib。
这可以做你想要的:
import requests
from requests.auth import HTTPBasicAuth
theurl= 'myLink_queriedResult/result.xls'
username = 'myUsername'
password = 'myPassword'
r=requests.get(theurl, auth=HTTPBasicAuth(username, password))
答案 1 :(得分:1)
您可以尝试使用Python 3,
import requests
#import necessary Authentication Method
from requests_ntlm import HttpNtlmAuth
from xlrd import open_workbook
import pandas as pd
from io import BytesIO
r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password'))
xd = pd.read_excel(BytesIO(r.content))
参考:
答案 2 :(得分:0)
您需要使用Cookie来允许身份验证。 `
font-family: initial;
答案 3 :(得分:0)
您可以使用 requests.get 下载文件。试试示例代码:
curl -XPOST "http://localhost:9200/localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty" -H 'Content-Type: application/json' -d'{ "script": { "source": "ctx._source.price += params.increment_by", "params": { "increment_by": 50 } }}'
享受吧!