Question

我尝试编写一个通过动态网址下载CSV文件的网络抓取工具。

网址与http://aaa/bbb.mcv/Download?path=xxxx.csv

类似

我将此网址添加到我的Chrome浏览器中，但我立即开始下载并且页面无法更改。

我甚至无法在开发屏幕中找到任何请求。

我已经尝试过获取文件的方法

将网址放入硒中

driver.get(url)
尝试通过请求lib
获取文件
requests.get(url)

两者都没有工作......

有什么建议吗？

输出两种方式：

我尝试拍摄屏幕，但似乎无法更改页面。（就像镀铬一样）
我尝试打印出我得到的数据，它看起来像是html文件然后在浏览器中打开它，这是一个登录页面。

Answer 1

import requests

url = '...'
save_location = '...'

session = requests.session()

response = session.get(url)
with open(save_location, 'wb') as t:
    for chunk in response.iter_content(1024):
        t.write(chunk)

Answer 2

Thanks for everyone's help!
I finally find the problem is that...
I login the website by selenium and I use requests to download the file.
Selenium doesn't have any authentication information!

So my solution is get the cookies by selenium first.
Then send it into the requests!

Here is my Code

cookies = driver.get_cookies() #selenium web driver

s = requests.Session()
for cookie in cookies:
    s.cookies.set(cookie['name'], cookie['value'])
response = s.get(url)

通过selenium＆amp; amp;从动态网址下载文件phantomjs

2 个答案: