request和urllib2从XBRL页面获取错误。 “您正在运行的浏览器模式与此应用程序不兼容”

时间:2018-08-14 01:16:11

标签: python python-requests urllib2

不确定为什么无法从此链接获取页面。我想要做的就是得到它,并放入beautifulsoup中。

<?xml version="1.0"?>
<ErrorResponse xmlns="https://mws.amazonservices.com/">
<Error>
    <Type>Sender</Type>
    <Code>InvalidParameterValue</Code>
    <Message>Value 2
   for parameter SignatureVersion is invalid.</Message>
</Error>
<RequestID>6ded1eed-eb92-4db6-9837-3453db0f8a77</RequestID>
</ErrorResponse> 

还尝试用以下方法伪造浏览器:

import requests,urllib2

link='https://www.sec.gov/ix?doc=/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm'

r = requests.get(link)

r2=urllib2.urlopen(link)
html=r2.read()

文本是相同的...不是我想要的页面。

获取标题如下

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

r = requests.get(link, headers=headers)

我可以使此页面正常: var note = 'The browser mode you are running is not compatible with this application.'; browserName ='Microsoft Internet Explorer'; note +='You are currently running '+browserName+' '+((ie7>0)?7:8)+'.0.'; var userAgent = window.navigator.userAgent.toLowerCase(); if(userAgent.indexOf('ipad') != -1 || userAgent.indexOf('iphone') != -1 || userAgent.indexOf('apple') != -1){ note += ' Please use a more current version of '+browserName+' in order to use the application.'; }else if(userAgent.indexOf('android') != -1){ note += ' Please use a more current version of Google Chrome or Mozilla Firefox in order to use the application.'; }else{ note += ' Please use a more current version of Microsoft Internet Explorer, Google Chrome or Mozilla Firefox in order to use the application.'; }

这不是XBRL文档。我认为这与XBRL有关,并且服务器希望我的浏览器与数据进行交互?

1 个答案:

答案 0 :(得分:1)

页面的这一部分似乎是由js呈现的。通常,动态内容最可靠的选项是selenium,但是在这种情况下,您可以避免使用它,而使用requests

很明显该页面使用了此文档/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm的内容。您可以绕过该页面并直接请求文档。

import requests

url = "https://www.sec.gov/Archives/edgar/data/1373715/000137371518000157/now-2018630x10q.htm"
r = requests.get(url)
html = r.text

print(html)