编辑:问题解决。最终,结果是网址中的“http:”而不是“https:”(这只是我的一个愚蠢的错误)。但这是来自cetver的漂亮干净的代码示例,帮助我隔离了问题。感谢所有提出建议的人。
将此url放在firefox中会触发相应的下载和另存为对话框:
https://www.virwox.com/orders.php?download_open=Download&format_open=.xls
以上链接与在https://www.virwox.com/orders.php页面上使用表单“下载”按钮提交表单相同。
以下是生成上述网址的表单的相关html:
<form action='orders.php' method='get'><fieldset><legend>Open Orders (2):</legend>
<input type='submit' value='Download' name='download_open' />
<select name='format_open'>
<option value='.xls'>.xls</option>
<option value='.csv'>.csv</option>
<option value='.xml'>.xml</option></select>
</form>
但是当我尝试下面的python代码时(我预期它不起作用)......
# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()
print(openOrders_xml)
... openOrders_xml最终只是原始页面(https://www.virwox.com/orders.php)。
firefox如何知道还有一个要下载的文件,如何在Python中检测和下载该文件?
请注意,这不是安全/登录问题,因为如果我遇到身份验证问题,我甚至无法获取orders.php页面。
编辑:我想知道这是否与重定向有关(我使用的是基本的重定向处理程序),或者我应该使用的东西是liek urllib.fileretrieve()。编辑:这是完整程序的代码,以防万一是相关的......
import urllib
import urllib2
import cookielib
import pprint
from urllib import urlencode
username=###############
password=###############
virwoxTopLevel_url = "http://www.virwox.com/"
overview_url = "https://www.virwox.com/index.php"
# Header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
# Handlers...
# cookie handler...
cookie_handler= urllib2.HTTPCookieProcessor( cookielib.CookieJar() )
# redirect handler...
redirect_handler= urllib2.HTTPRedirectHandler()
# create "opener" (OpenerDirector instance)
virwox_opener = urllib2.build_opener(redirect_handler,cookie_handler)
# login
login_url = "https://www.virwox.com/index.php"
values = { 'uname' : username, 'password' : password }
login_data = urllib.urlencode(values)
login_request = urllib2.Request(login_url,login_data,headers)
login_response = virwox_opener.open(login_request)
overview_html = login_response.read();
virwox_json_url = "http://api.virwox.com/api/json.php"
getTest = urllib.urlencode( { "method":"getMarketDepth", "symbols[0]":"EUR/SLL","symbols[1]":"USD/SLL","buyDepth":1,"sellDepth":1,"id":1 } )
get_response = urllib2.urlopen(virwox_json_url,getTest)
#print get_response.read()
# get orders list
openOrders_url = virwoxTopLevel_url+"/orders.php"
openOrders_params = urlencode( { "download_open":"Download", "format_open":".xml" } )
openOrders_request = urllib2.Request(openOrders_url,openOrders_params,headers)
openOrders_response = virwox_opener.open(openOrders_request)
openOrders_xml = openOrders_response.read()
# the following prints the html of the /orders.php page not the desired download data:
print "******************************************"
print(openOrders_xml)
print "******************************************"
print openOrders_response.info()
print openOrders_response.geturl()
print "******************************************"
# the following prints nothing, i assume because without the cookie handler, fails to authenticate
# (note that authentication is by the php program, not html authentication, so no "authentication hangler" above
print urllib2.urlopen("https://www.virwox.com/orders.php?download_open=Download&format_open=.xml").read()
答案 0 :(得分:1)
代码BELLOW未经过测试
类似的东西:
import urllib, urllib2,
HOST = 'https://www.virwox.com'
FORMS = {
'login': {
'action': HOST + '/index.php',
'data': urllib.urlencode( {
'uname':'username',
'password':'******'
} )
},
'orders': {
'action': HOST + '/orders.php',
'data': urllib.urlencode( {
'download_open':'Download',
'format_open':'.xml'
} )
}
}
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
try:
req = urllib2.Request( url = FORMS['login']['action'], data = FORMS['login']['data'] )
opener.open( req ) #save login cookie
print 'Login: OK'
except Exception, e:
print 'Login: Fail'
print e
try:
req = urllib2.Request( url = FORMS['orders']['action'], data = FORMS['orders']['data'] )
print 'Orders Page: OK'
except Exception, e:
print 'Orders Page: Fail'
print e
try:
xml = opener.open( req ).read()
print xml
except Exception, e:
print 'Obtain XML: Fail'
print e
答案 1 :(得分:1)
您的问题似乎已经得到解答,但您可能希望查看Requests包。它基本上是标准lib工具的一个很好的包装器。以下(可能)做你想要的。
import requests
r = requests.get('http://www.virwox.com/orders.php',
allow_redirects=True,
auth=('user', 'pass'),
data={'download_open': 'Download', 'format_open': '.xls'})
print r.content
答案 2 :(得分:0)
你可能需要urllib2.HTTPPasswordMgr
这样的(未经测试,因为我没有你的uname / pw):
import urllib
import urllib2
uri = "http://www.virwox.com/"
url = uri + "orders.php"
uname = "USERNAME"
password = "PASSWORD"
post = urllib.urlencode({"download_open":"Download", "format_open":".xls"})
pwMgr = urllib2.HTTPPasswordMgr()
pwMgr.add_password(realm=None, uri=uri, user=uname, passwd=password)
urllib2.install_opener(urllib2.build_opener(urllib2.HTTPDigestAuthHandler(pwMgr)))
req = urllib2.Request(url, post)
s = urllib2.urlopen(req)
cookie = s.headers['Set-Cookie']
s.close()
req.add_header('Cookie', cookie)
s = urllib2.urlopen(req)
source = s.read()
s.close()
然后,你可以:
print source
查看它是否包含您需要的xml数据。