Question

我希望使用Python获取托管在我网站上的文本文件的内容。服务器需要在您的浏览器上启用JavaScript。因此，当我跑：

    import urllib2  
    target_url = "http://09hannd.me/ai/request.txt"
    data = urllib2.urlopen(target_url)

我收到一个html页面，说启用JavaScript。我想知道是否有一种假装启用JS的方法。

由于

Answer 1

Selenium是去这里的方式，但还有另一个“hacky”选项。

基于这个答案：https://stackoverflow.com/a/26393257/2517622

import requests

url = 'http://09hannd.me/ai/request.txt'
response = requests.get(url, cookies={'__test': '2501c0bc9fd535a3dc831e57dc8b1eb0'})
print(response.content) # Output: find me a cafe nearby

Answer 2

我可能会建议像这样的工具。 https://github.com/niklasb/dryscrape

此外，您可以在此处查看更多信息：Using python with selenium to scrape dynamic web pages

当页面需要启用JavaScript时，Python会获取URL内容

2 个答案: