Question

我是Python新手，请尽早接受我的道歉，因为潜在的新手错误。我正在尝试解析一个“简单”的网页：http://flow.gassco.no/

首次在浏览器中打开页面时，我需要通过接受按钮确认T＆amp; C.

我的解析工具在Beautifulsoap中实现，但我无法解析内容。当从BS打印“response.text”时，我得到下面的代码。如何绕过此表单接受条款＆amp;条件？

这是我正在做的事情：

#!/usr/bin/env python 
import requests 
import bs4 
index_url='http://flow.gassco.no/acceptDisclaimer'

def get_video_page_urls(): 
response = requests.get(index_url) 
soup = bs4.BeautifulSoup(response.text) 
return soup 
print(get_video_page_urls())

谢谢！

     <form action="acceptDisclaimer" method="get">
     <input class="accept" type="submit" value="Accept"/>
     <input class="decline" name="decline" onclick="window.location ='http://www.gassco.no'" type="button" value="Decline"/>
     </form></div></div></div></div></div>

    <script type="text/javascript">
    var _gaq = _gaq || [];
    _gaq.push(['_setAccount', 'UA-30727768-1']);
    _gaq.push(['_trackPageview']);

    (function() {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    })();

</script>

Answer 1

您无需解析该内容。您只需向http://flow.gassco.no/acceptDisclaimer提出请求。

Answer 2

此特定网站预计，当您接受免责声明时，会在发送主页（http://flow.gassco.no）时发送给用户代理的Cookie。因此，您可以通过发出两个请求来使您的脚本工作：一个用于主页，一个用于接受免责声明。为此，以下代码段完成了获取主页的工作：

url1 = 'http://flow.gassco.no/'
res1 = requests.get(url1)
url2 = 'http://flow.gassco.no/acceptDisclaimer/'
res2 = requests.get(url2, cookies=res1.cookies)
print(res2.text) # The actual home page

+++使用Python提交[ACCEPT - Button]

2 个答案: