使用beautifulsoup4网页报废的麻烦

时间:2015-10-16 06:23:42

标签: python beautifulsoup

我在线学习教程,学习如何使用beautifulsoup和Requests构建网络剪贴簿,但我一直遇到问题。

我的代码:

base_url = r"http://stackoverflow.com/"

r = requests.get(base_url)
soup = BeautifulSoup(r.content)
print(soup)

最初,它打印出一条错误消息

No parser was explicitly specified, so I'm using the best available HTML
parser for this system ("html.parser"). This usually isn't a problem, 
but if you run this code on another system, or in a different virtual 
environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

 markup_type=markup_type))

然后我将代码更改为:

base_url = r"http://stackoverflow.com/"

r = requests.get(base_url)
soup = BeautifulSoup([r.content], "html.parser")
print(soup)

但它仍然显示错误。

对于我尝试过的每个网站都会发生这种情况。有什么问题?

1 个答案:

答案 0 :(得分:1)

下面的代码对我来说很好 -

import requests
from bs4 import BeautifulSoup

base_url = r"http://stackoverflow.com/"


r = requests.get(base_url)
soup = BeautifulSoup(r.text.encode('utf-8'), 'html.parser')
print(soup)

打印 -

<!DOCTYPE html>

<html>
<head>
<title>Stack Overflow</title>
<link href="//cdn.sstatic.net/stackoverflow/img/favicon.ico?v=4f32ecc8f43d" rel="shortcut icon">
<link href="//cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a" rel="apple-touch-icon image_src">
<link href="/opensearch.xml" rel="search" title="Stack Overflow" type="application/opensearchdescription+xml">
<meta content="summary" name="twitter:card">
<meta content="stackoverflow.com" name="twitter:domain"/>
<meta content="website" property="og:type"/>
<meta content="http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded&amp;a" itemprop="image primaryImageOfPage" property="og:image"/>
<meta content="Stack Overflow" itemprop="title name" name="twitter:title" property="og:title"/>
<meta content="Q&amp;A for professional and enthusiast programmers" itemprop="description" name="twitter:description" property="og:description"/>
<meta content="http://stackoverflow.com/" property="og:url"/>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script src="//cdn.sstatic.net/Js/stub.en.js?v=4e67e00a5514"></script>
<link href="//cdn.sstatic.net/stackoverflow/all.css?v=f29b1dcb2836" rel="stylesheet" type="text/css">
<link href="/feeds" rel="alternate" title="Feed of recent questions" type="application/atom+xml">
<script>
        StackExchange.init({"locale":"en","stackAuthUrl":"https://stackauth.com","serverTime":1444977291,"networkMetaHostname":"meta.stackexchange.com","routeName":"Home/Index","styleCode":true,"enableUserHovercards":true,"snippets":{"enabled":true,"domain":"stacksnippets.net"},"site":{"name":"Stack Overflow","description":"Q&A for professional and enthusiast programmers","isNoticesTabEnabled":true,"recaptchaPublicKey":"6LdchgIAAAAAAJwGpIzRQSOFaO0pU6s44Xt8aTwc","recaptchaAudioLang":"en","enableNewTagCreationWarning":true,"insertSpaceAfterNameTabCompletion":false,"globalAuthDisabled":true,"nonAsciiTags":true,"enableSocialMediaInSharePopup":true},"user":{"fkey":"da6bb7341acea9ff519605b8830f9ab4","rep":0,"isAnonymous":true,"isAnonymousNetworkWide":true,"ab":{"anon_popups":{"v":"d","g":2},"profile_integration_signup":{"v":"b","g":2}}}});
        StackExchange.using.setCacheBreakers({"js/prettify-full.en.js":"f3d53dad4c22","js/moderator.en.js":"fa05d92dbab5","js/full-anon.en.js":"3d170576686f","js/full.en.js":"72265ec4d33b","js/wmd.en.js":"93bf4b8da915","js/third-party/jquery.autocomplete.min.js":"e5f01e97f7c3","js/third-party/jquery.autocomplete.min.en.js":"","js/mobile.en.js":"f9007a314275","js/help.en.js":"69b2e9e77696","js/tageditor.en.js":"c84618a71b61","js/tageditornew.en.js":"3c95b8b827f4","js/inline-tag-editing.en.js":"de80429b1816","js/revisions.en.js":"9e897f24d78d","js/review.en.js":"07004bafa2a0","js/tagsuggestions.en.js":"d1ff9b84abe5","js/post-validation.en.js":"cdaae4616a26","js/explore-qlist.en.js":"cd6e5274146c","js/events.en.js":"56d31cc69b44","js/keyboard-shortcuts.en.js":"a8f86d8a32bb","js/external-editor.en.js":"6484cd83ad12","js/external-editor.en.js":"6484cd83ad12","js/snippet-javascript.en.js":"ad6b3ff5e697","js/snippet-javascript-codemirror.en.js":"bf736facf21d"});
        StackExchange.using("gps", function() {
             StackExchange.gps.init(true);
        });
    </script>
<script>
            StackExchange.ready(function () {
                $('#nav-tour').click(function () {
                    StackExchange.using("gps", function() {
                        StackExchange.gps.track("aboutpage.click", { aboutclick_location: "headermain" }, true);
                    });
                });
            });
        </script>
</link></link></meta></link></link></link></head>
<body class="home-page new-topbar">
<noscript><div id="noscript-padding"></div></noscript>
<div id="notify-container"></div>
<div id="overlay-header"></div>
<div id="custom-header"></div>
<div class="topbar">
<div class="topbar-wrapper">
<div class="js-topbar-dialog-corral">
<div class="topbar-dialog siteSwitcher-dialog dno">
<div class="header">
<h3><a href="//stackoverflow.com">current community</a></h3>
</div>
<div class="modal-content current-site-container">
<ul class="current-site">
<li>
<div class="related-links">
<a class="js-gps-track" data-gps-track="site_switcher.click({ item_type:6 })" href="http://chat.stackoverflow.com">chat</a>
</div>
<a class="current-site-link site-link js-gps-track" data-gps-track="
        site_switcher.click({ item_type:3 })" data-id="1" href="//stackoverflow.com">
<div class="site-icon favicon favicon-stackoverflow" title="Stack Overflow"></div>
        Stack Overflow
    </a>
</li>
<li class="related-site">
<div class="L-shaped-icon-container">
<span class="L-shaped-icon"></span>
</div>
<a class="site-link js-gps-track" data-gps-track="
            site.switch({ target_site:552, item_type:3 }),
        site_switcher.click({ item_type:4 })" data-id="552" href="http://meta.stackoverflow.com">
<div class="site-icon favicon favicon-stackoverflowmeta" title="Meta Stack Overflow"></div>
        Meta Stack Overflow
    </a>
</li>
<li class="related-site">
<div class="L-shaped-icon-container">
<span class="L-shaped-icon"></span>
</div>
<a class="site-link js-gps-track" data-gps-track="site_switcher.click({ item_type:9 })" href="//careers.stackoverflow.com?utm_source=stackoverflow.com&amp;utm_medium=site-ui&amp;utm_campaign=multicollider">
<div class="site-icon favicon favicon-careers" title="Stack Overflow Careers"></div>
                        Stack Overflow Careers
                    </a>
</li>
</ul>
</div>
<div class="header" id="your-communities-header">
<h3>
your communities        </h3>
</div>
<div class="modal-content" id="your-communities-section">
<div class="call-to-login">
<a class="login-link js-gps-track" data-gps-track="site_switcher.click({ item_type:10 })" href="https://stackoverflow.com/users/signup?ssrc=site_switcher&amp;returnurl=http%3a%2f%2fstackoverflow.com%2f">Sign up</a> or <a class="login-link js-gps-track" data-gps-track="site_switcher.click({ item_type:11 })" href="https://stackoverflow.com/users/login?ssrc=site_switcher&amp;returnurl=http%3a%2f%2fstackoverflow.com%2f">log in</a> to customize your list.
            </div>
</div>
<div class="header">
<h3><a href="//stackexchange.com/sites">more stack exchange communities</a></h3>
<a class="fr" href="http://blog.stackoverflow.com">company blog</a>
</div>
<div class="modal-content">
<div class="child-content"></div>
</div>
</div>
</div>
<div class="network-items">
<a class="topbar-icon icon-site-switcher yes-hover js-site-switcher-button js-gps-track" data-gps-track="site_switcher.show" href="//stackexchange.com" title="A list of all 150 Stack Exchange sites">
<span class="hidden-text">Stack Exchange</span>
</a>
<a class="topbar-icon icon-inbox yes-hover js-inbox-button" href="#" title="Recent inbox messages">
<span class="hidden-text">Inbox</span>
<span class="unread-count" style="display:none"></span>
</a>
<a class="topbar-icon icon-achievements yes-hover js-achievements-button " data-unread-class="" href="#" title="Recent achievements: reputation, badges, and privileges earned">
<span class="hidden-text">Reputation and Badges</span>
<span class="unread-count" style="display:none">
</span>
</a>
</div>
<div class="topbar-links">
<div class="links-container">
<span class="topbar-menu-links">
<a class="login-link" href="https://stackoverflow.com/users/signup?ssrc=head&amp;returnurl=http%3a%2f%2fstackoverflow.com%2f">sign up</a>
<a class="login-link" href="https://stackoverflow.com/users/login?ssrc=head&amp;returnurl=http%3a%2f%2fstackoverflow.com%2f">log in</a>
<a href="/tour">tour</a>
<a class="icon-help js-help-button" href="#" title="Help Center and other resources">
        help
        <span class="triangle"></span>
</a>
<div class="topbar-dialog help-dialog js-help-dialog dno">
<div class="modal-content">
<ul>
<li>
<a class="js-gps-track" data-gps-track="help_popup.click({ item_type:1 })" href="/tour">
                            Tour
                            <span class="item-summary">
                                Start here for a quick overview of the site
                            </span>
</a>
</li>
<li>
<a class="js-gps-track" data-gps-track="help_popup.click({ item_type:4 })" href="/help">
                        Help Center
                        <span class="item-summary">
                            Detailed answers to any questions you might have
                        </span>
</a>
</li>
<li>
<a class="js-gps-track" data-gps-track="help_popup.click({ item_type:2 })" href="//meta.stackoverflow.com">
                            Meta
                            <span class="item-summary">
                                Discuss the workings and policies of this site
                            </span>
</a>
</li>
</ul>
</div>
</div>
<a href="//careers.stackoverflow.com?utm_source=stackoverflow.com&amp;utm_medium=site-ui&amp;utm_campaign=anon-topbar">stack overflow careers</a>
</span>
</div>
<div class="search-container">
<form action="/search" autocomplete="off" id="search" method="get">
<input autocomplete="off" maxlength="240" name="q" placeholder="search" tabindex="1" type="text" value=""/>
</form>
</div>
</div>
</div>
</div>
<script>
        StackExchange.ready(function() { StackExchange.topbar.init(); });
    </script>
<div class="container">
<div id="header">
<br class="cbt">
<div id="hlogo">
<a href="/">
                    Stack Overflow
                </a>
</div>
<div id="hmenus">
<div class="nav mainnavs">
<ul>
<li><a href="/questions" id="nav-questions">Questions</a></li>
<li><a href="/tags" id="nav-tags">Tags</a></li>
<li><a href="/users" id="nav-users">Users</a></li>
<li><a href="/help/badges" id="nav-badges">Badges</a></li>
<li><a href="/unanswered" id="nav-unanswered">Unanswered</a></li>
</ul>
</div>
<div class="nav askquestion">
<ul>
<li>
<a href="/questions/ask" id="nav-askquestion">Ask Question</a>
</li>
</ul>
</div>
</div>
</br></div>
<div class="snippet-hidden" id="content">
<div id="herobox">
<div id="hero-content">
<div id="close"><a title="click to minimize">_</a></div>
<div id="blurb">
            Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.
            <br/>
<br/>
<a class="button" href="/users/signup?ssrc=hero&amp;returnurl=http%3a%2f%2fstackoverflow.com%2f" id="tell-me-more">Sign up</a>
</div>
<div id="desc">
<b>Here's how it works:</b>
<ol id="hiw">
<li id="q">Anybody can ask a question
                </li>
<li id="an">Anybody can answer
                </li>
<li id="b">The best answers are voted up and rise to the top
                </li>
</ol>
</div>
<div style="clear: both"></div>
</div>
<script>
        StackExchange.ready(function () {

            var location = 0;
            if ($("body").hasClass("questions-page")) {
                location = 1;;
            } else if ($("body").hasClass("question-page")) {
                location = 1;;
            } else if ($("body").hasClass("faq-page")) {
                location = 5;;
            } else if ($("body").hasClass("home-page")) {
                location = 3;;
            }

            $('#herobox li').click(function () {
                StackExchange.using("gps", function () {
                    StackExchange.gps.track("aboutpage.click", { aboutclick_location: "hero" }, true);
                });

                window.location.href = '/tour';
            });
            $('#tell-me-more').click(function () {
                StackExchange.using("gps", function () {
                    StackExchange.gps.track("hero.action", { hero_action_type: 'cta', location: location }, true);
                });
            });
            $('#herobox #close').click(function () {
                StackExchange.using("gps", function () {
                    StackExchange.gps.track("hero.action", { hero_action_type: "minimize", location: location }, true);
                });
                $.cookie("hero", "mini", { path: "/", expires: 365 });
                $.ajax({
                    url: "/hero-mini",
                    success: function (data) {
                        $("#herobox").fadeOut("fast", function () {
                            $("#herobox").replaceWith(data);
                            $("#herobox-mini").fadeIn("fast");
                        });
                    }
                });
                return false;
            });
        });

    </script>
</div>
<script>
        StackExchange.using("gps", function () {
            StackExchange.gps.track("hero.show", { hero_type: "hero" }, true);
        });
    </script>
<div id="mainbar">
<div class="subheader">
<h1 id="h-top-questions">
                Top Questions
        </h1>
<div id="tabs">
<a class="youarehere" data-nav-xhref="" data-value="interesting" href="?tab=interesting" title="Questions that may be of interest to you based on your history and tag preference">interesting</a>
<a data-nav-xhref="" data-value="featured" href="?tab=featured" title="Questions with an active bounty"><span class="bounty-indicator-tab">414</span>featured</a>
<a data-nav-xhref="" data-value="hot" href="?tab=hot" title="Questions with the most views, answers, and votes over the last few days">hot</a>
<a data-nav-xhref="" data-value="week" href="?tab=week" title="Questions with the most views, answers, and votes this week">week</a>
<a data-nav-xhref="" data-value="month" href="?tab=month" title="Questions with the most views, answers, and votes this month">month</a>
</div>
</div>
<div id="qlist-wrapper">
<div id="question-mini-list">
<div class="question-summary narrow" id="question-summary-33163422">
<div class="cp" onclick="window.location.href='/questions/33163422/layout-became-unresponsive-while-click-on-navigation-drawer'">
<div class="votes">
<div class="mini-counts"><span title="1 vote">1</span></div>
<div>vote</div>
</div>
<div class="status unanswered">
<div class="mini-counts"><span title="0 answers">0</span></div>
<div>answers</div>
</div>
<div class="views">
<div class="mini-counts"><span title="14 views">14</span></div>
<div>views</div>
</div>
</div>
<div class="summary">
<h3><a class="question-hyperlink" href="/questions/33163422/layout-became-unresponsive-while-click-on-navigation-drawer" title="This is my main activity file which contains the drawer layout.

public abstract class ParentActivity extends AppCompatActivity implements NetworkStateReceiverListener {
    private final String TAG = ...">Layout Became Unresponsive while click on navigation drawer</a></h3>
<div class="tags t-android t-listview t-android-navigation-drawer">
<a class="post-tag" href="/questions/tagged/android" rel="tag" title="show questions tagged 'android'"><img alt="" class="sponsor-tag-img" height="16" src="//i.stack.imgur.com/tKsDb.png" width="18">android</img></a> <a class="post-tag" href="/questions/tagged/listview" rel="tag" title="show questions tagged 'listview'">listview</a> <a class="post-tag" href="/questions/tagged/android-navigation-drawer" rel="tag" title="show questions tagged 'android-navigation-drawer'">android-navigation-drawer</a>
</div>
<div class="started">
<a class="started-link" href="/questions/33163422/layout-became-unresponsive-while-click-on-navigation-drawer">modified <span class="relativetime" title="2015-10-16 06:33:57Z">54 secs ago</span></a>
<a href="/users/5452389/jaggs">Jaggs</a> <span class="reputation-score" dir="ltr" title="reputation score ">6</span>
</div>
</div>
</div>
<div class="question-summary narrow" id="question-summary-33163939">
<div class="cp" onclick="window.location.href='/questions/33163939/cannot-calculate-double-over-time-within-my-method'">
<div class="votes">
<div class="mini-counts"><span title="0 votes">0</span></div>
<div>votes</div>
</div>
<div class="status unanswered">
<div class="mini-counts"><span title="0 answers">0</span></div>
<div>answers</div>
</div>
<div class="views">
<div class="mini-counts"><span title="3 views">3</span></div>
<div>views</div>
</div>
</div>
<div class="summary">
<h3><a class="question-hyperlink" href="/questions/33163939/cannot-calculate-double-over-time-within-my-method" title='I am having trouble calculating the double over time (2.0m * the hourly rate) within my "gross_pay" method once the user enters hours that go over 60. I have done the regular gross pay (hours* rate of ...'>Cannot calculate double over time within my method.</a></h3>
<div class="tags t-c├▒ t-debugging t-methods t-logic">
<a class="post-tag" href="/questions/tagged/c%23" rel="tag" title="show questions tagged 'c#'">c#</a> <a class="post-tag" href="/questions/tagged/debugging" rel="tag" title="show questions tagged 'debugging'">debugging</a> <a class="post-tag" href="/questions/tagged/methods" rel="tag" title="show questions tagged 'methods'">methods</a> <a class="post-tag" href="/questions/tagged/logic" rel="tag" title="show questions tagged 'logic'">logic</a>
</div>
<div class="started">
<a class="started-link" href="/questions/33163939/cannot-calculate-double-over-time-within-my-method">asked <span class="relativetime" title="2015-10-16 06:33:50Z">1 min ago</span></a>
<a href="/users/5452512/chocolatesnow">ChocolateSnow</a> <span class="reputation-score" dir="ltr" title="reputation score ">1</span>
</div>
</div>
</div>
<div class="question-summary narrow" id="question-summary-33163583">
<div class="cp" onclick="window.location.href='/questions/33163583/is-it-possible-to-automate-hybrid-appios-and-android-by-using-protractor-witho'">
<div class="votes">
<div class="mini-counts"><span title="0 votes">0</span></div>
<div>votes</div>
</div>
<div class="status unanswered">
<div class="mini-counts"><span title="0 answers">0</span></div>
<div>answers</div>
</div>
<div class="views">
<div class="mini-counts"><span title="4 views">4</span></div>
<div>views</div>
</div>
</div>
<div class="summary">
<h3><a class="question-hyperlink" href="/questions/33163583/is-it-possible-to-automate-hybrid-appios-and-android-by-using-protractor-witho" title="Can any one clear this up for me, is protractor a tool or a framework?

As per my understanding protractor is a framework on which selenium on built on top of.

If I want to make a mobile app (hybrid ...">is it possible to automate hybrid app(ios and android) by using protractor without using appium</a></h3>
<div class="tags t-android t-ios t-selenium-webdriver t-protractor t-appium">
<a class="post-tag" href="/questions/tagged/android" rel="tag" title="show questions tagged 'android'"><img alt="" class="sponsor-tag-img" height="16" src="//i.stack.imgur.com/tKsDb.png" width="18">android</img></a> <a class="post-tag" href="/questions/tagged/ios" rel="tag" title="show questions tagged 'ios'">ios</a> <a class="post-tag" href="/questions/tagged/selenium-webdriver" rel="tag" title="show questions tagged 'selenium-webdriver'">selenium-webdriver</a> <a class="post-tag" href="/questions/tagged/protractor" rel="tag" title="show questions tagged 'protractor'">protractor</a> <a class="post-tag" href="/questions/tagged/appium" rel="tag" title="show questions tagged 'appium'">appium</a>
</div>
<div class="started">
<a class="started-link" href="/questions/33163583/is-it-possible-to-automate-hybrid-appios-and-android-by-using-protractor-witho">modified <span class="relativetime" title="2015-10-16 06:33:37Z">1 min ago</span></a>
<a href="/users/26931/armstrongest">Armstrongest</a> <span class="reputation-score" dir="ltr" title="reputation score ">7,461</span>
</div>
</div>
</div>
<div class="question-summary narrow" id="question-summary-33161882">
<div class="cp" onclick="window.location.href='/questions/33161882/resourceadapter-config-properties-for-wmq-7-5-taking-defaults-instead-of-specifi'">
<div class="votes">
<div class="mini-counts"><span title="0 votes">0</span></div>
<div>votes</div>
</div>
<div class="status unanswered">
<div class="mini-counts"><span title="0 answers">0</span></div>
<div>answers</div>
</div>
<div class="views">
<div class="mini-counts"><span title="4 views">4</span></div>
<div>views</div>
</div>
</div>
<div class="summary">
<h3><a class="question-hyperlink" href="/questions/33161882/resourceadapter-config-properties-for-wmq-7-5-taking-defaults-instead-of-specifi" title='I have a resource adapter for Websphere MQ 7.5

             &lt;resource-adapter id="wmq.jmsra.rar"&gt;
                &lt;archive&gt;
                    wmq.jmsra.rar
                ...'>resourceAdapter config properties for WMQ 7.5 taking defaults instead of specified configs</a></h3>
<div class="tags t-websphere-mq t-wildfly">
<a class="post-tag" href="/questions/tagged/websphere-mq" rel="tag" title="show questions tagged 'websphere-mq'">websphere-mq</a> <a class="post-tag" href="/questions/tagged/wildfly" rel="tag" title="show questions tagged 'wildfly'">wildfly</a>
</div>
<div class="started">
<a class="started-link" href="/questions/33161882/resourceadapter-config-properties-for-wmq-7-5-taking-defaults-instead-of-specifi">modified <span class="relativetime" title="2015-10-16 06:33:37Z">1 min ago</span></a>
<a href="/users/579435/sarmahdi">sarmahdi</a> <span class="reputation-score" dir="ltr" title="reputation score ">173</span>
</div>
</div>
</div>
<div class="question-summary narrow" id="question-summary-33160331">
<div class="cp" onclick="window.location.href='/questions/33160331/c-program-crashes-when-calling-second-web-service-using-libcurl'">
<div class="votes">
<div class="mini-counts"><span title="0 votes">0</span></div>
<div>votes</div>
</div>
<div class="status answered">
<div class="mini-counts"><span title="1 answer">1</span></div>
<div>answer</div>
</div>
<div class="views">
<div class="mini-counts"><span title="16 views">16</span></div>
<div>views</div>
</div>
</div>
<div class="summary">
<h3><a class="question-hyperlink" href="/questions/33160331/c-program-crashes-when-calling-second-web-service-using-libcurl" title="I have a C program that execute a webservice and then depending of the result of that web service it execute a second web service, but when I call curl_easy_perform(curl) on the second web service my ...">C- program crashes when calling second web service using libcurl</a></h3>
<div class="tags t-c t-libcurl">
<a class="post-tag" href="/questions/tagged/c" rel="tag" title="show questions tagged 'c'">c</a> <a class="post-tag" href="/questions/tagged/libcurl" rel="tag" title="show questions tagged 'libcurl'">libcurl</a>
</div>
<div class="started">
.................. AND CONTINUED............