我需要抓取此website
上的所有可用数据这是我的代码
import requests
url = "https://bpcleproc.in/EPROC/viewtender/13474"
r = requests.get(url)
print(r)
print(r.text)
我得到的结果是
<Response [200]>
<script type="text/javascript">
window.location.href = "/EPROC/viewtender/13474";
</script>
我不明白为什么这不起作用。该网站是通过js文件生成的吗?
答案 0 :(得分:4)
在将您带到最终页面之前,网站会在重定向循环中设置一堆Cookie,原因如下:
$ curl -vv https://bpcleproc.in/EPROC/viewtender/13474
* Trying 103.231.215.7...
* Connected to bpcleproc.in (103.231.215.7) port 443 (#0)
* TLS 1.0 connection using TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
* Server certificate: *.bpcleproc.in
* Server certificate: Go Daddy Secure Certificate Authority - G2
* Server certificate: Go Daddy Root Certificate Authority - G2
> GET /EPROC/viewtender/13474 HTTP/1.1
> Host: bpcleproc.in
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Wed, 25 May 2016 16:43:42 GMT
< Server: Apache-Coyote/1.1
< Cache-Control: no-store
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Pragma: no-cache
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://bpcleproc.in/EPROC/setcookie
< Content-Type: UTF-8;charset=UTF-8
< Content-Language: 1
< Content-Length: 0
< Set-Cookie: JSESSIONID=D2843B0E183A813195650A281A9FCC7D.tomcat2; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: locale=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: moduleId=3; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: isShowDcBanner=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: listingStyle=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: logo=Untitled.jpg; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: theme=theme-1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: dateFormat=DD/MM/YYYY; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: conversionValue=103; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: eprocLogoName=1; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: phoneNo=07940016868; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
< Set-Cookie: email="support@bpcleproc.in"; Version=1; Max-Age=86400; Expires=Thu, 26-May-2016 16:43:42 GMT; Path=/EPROC/; Secure; HttpOnly
<
* Connection #0 to host bpcleproc.in left intact
您可以使用requests.Session
来模拟该行为:
import requests
session = requests.Session()
# First, get the cookies.
# The session keeps track of cookies and requests follows redirects for you
r = session.get("https://bpcleproc.in/EPROC/viewtender/13474")
# Then, simulate following the JS redirect
r = session.get("https://bpcleproc.in/EPROC/viewtender/13474")
print(r)
print(r.text)