未通过get请求收到html内容

时间:2016-05-25 16:42:00

标签: python python-requests

我需要抓取此website

上的所有可用数据

这是我的代码

import requests

url = "https://bpcleproc.in/EPROC/viewtender/13474"

r = requests.get(url)

print(r)
print(r.text) 

我得到的结果是

<Response [200]>
<script type="text/javascript">
window.location.href = "/EPROC/viewtender/13474";
</script>

我不明白为什么这不起作用。该网站是通过js文件生成的吗?

1 个答案:

答案 0 :(得分:4)

在将您带到最终页面之前,网站会在重定向循环中设置一堆Cookie,原因如下:

$ curl -vv https://bpcleproc.in/EPROC/viewtender/13474
*   Trying 103.231.215.7...
* Connected to bpcleproc.in (103.231.215.7) port 443 (#0)
* TLS 1.0 connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
* Server certificate: *.bpcleproc.in
* Server certificate: Go Daddy Secure Certificate Authority - G2
* Server certificate: Go Daddy Root Certificate Authority - G2
> GET /EPROC/viewtender/13474 HTTP/1.1
> Host: bpcleproc.in
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 25 May 2016 16:43:42 GMT
< Server: Apache-Coyote/1.1
< Cache-Control: no-store
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Pragma: no-cache
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://bpcleproc.in/EPROC/setcookie
< Content-Type: UTF-8;charset=UTF-8
< Content-Language: 1
< Content-Length: 0
< Set-Cookie: JSESSIONID=D2843B0E183A813195650A281A9FCC7D.tomcat2; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: locale=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: moduleId=3; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: isShowDcBanner=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: listingStyle=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: logo=Untitled.jpg; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: theme=theme-1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: dateFormat=DD/MM/YYYY; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: conversionValue=103; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: eprocLogoName=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: phoneNo=07940016868; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: email="support@bpcleproc.in"; Version=1; Max-Age=86400; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
<
* Connection #0 to host bpcleproc.in left intact

您可以使用requests.Session来模拟该行为:

import requests

session = requests.Session()

# First, get the cookies.
# The session keeps track of cookies and requests follows redirects for you
r = session.get("https://bpcleproc.in/EPROC/viewtender/13474")

# Then, simulate following the JS redirect
r = session.get("https://bpcleproc.in/EPROC/viewtender/13474")

print(r)
print(r.text)