我整天都想解决这个问题。 有http://www.some.site/index.php请求用户和密码+发送cookie。好吧,我这样进来了:
import urllib, urllib2, cookielib, os
import re # not required here but tried it out though
import requests # not required here but tried it out though
username = 'somebody'
password = 'somepass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
print resp.read()
问题是在屏幕中间有一个下载.xls文件的链接:http://www.some.site/excel_file.php?/t=1303457489。我可以在任何浏览器(Mozilla,Chrome,IE)中下载该文件,但不能使用Python。在.php之后,发布数据(即:?t = 1370919996)在我登录或刷新页面时一直在变化。
也许我错了,但我相信发布数据是从cookie(或会话cookie)生成的,但cookie只包含这个:('set-cookie', 'PHPSESSID=9cde55534fcc8e136fcf6588c0d0f1df; path=/')
这是我尝试保存文件的一种方式:
print "downloading with urllib2"
f = urllib2.urlopen('http://www.some.site/excel_file.php')
data = f.read()
with open("exceldoc.xls", "wb") as code:
code.write(data)
如果我保存它或者我将其打印出来会产生相同的错误请求错误:
<b>Fatal error</b>: Call to a member function FetchRow() on a non-object in <b>http://www.some.site/excel_file.php</b> on line <b>112</b><br
如何使用Python下载此文件?非常感谢您提前寻求帮助!
有许多类似的帖子,我已经检查了它们,我的例子受到启发,但对我来说没什么用。我对cookies,php,js。
不是很熟悉编辑:这是我打印出index.php的内容时得到的:
<html>
<head>
<title>SOMETITLE</title>
<meta http-equiv="Page-Enter" content="blendTrans(Duration=0.5)">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel='stylesheet' type='text/css' href='somesite.css'>
<SCRIPT LANGUAGE="JavaScript">
<!-- JavaScript hiding
function clearDefault(obj) {
if (!obj._cleared) {
obj.value='';
obj._cleared=true;
}
}
// -->
</SCRIPT>
</head>
<body bgcolor="#FFFFFF" text="#000000">
<table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td>
<table width="1000" height="150" border="0" align="center" cellpadding="16" cellspacing="0" class="header" style="background: #989896 url('images/header.png') no-repeat;">
<tr>
<td valign="middle">
<table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
<tr>
<td width="380"> </td>
<td>
<div id="login">
<form name="flogin" method="post" action="/index.php">
<h1>Login</h1>
<input name="uName" type="text" value="Username:" class="name" onfocus="clearDefault(this)">
<br>
<input type="password" name="uPw" value="Password:" class="pass" onfocus="clearDefault(this)">
<input type="submit" name="Submit" value="OK" class="submit">
</form>
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</body>
</html>
答案 0 :(得分:1)
您可以尝试解析第一个代码部分的响应,并使用提取的网址使用相同的opener
。不知道链接的实际格式:
import urllib, urllib2, cookielib, os
import re # going to use this now!
username = 'somebody'
password = 'somepass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
content = resp.read()
print content
match = re.search(
r"<a\s+href=\"(?P<file_link>http://www.some.site/excel_file.php?t=\d+)\">",
content,
re.IGNORECASE
)
assert match is not None, "Couldn't find the file link..."
file_link = match.group('file_link')
print "downloading {} with urllib2".format(file_link)
f = opener.open(file_link)
data = f.read()
with open("exceldoc.xls", "wb") as code:
code.write(data)