为什么我不能用BeautifulSoup解析Facebook应用页面上的img标签?

时间:2012-03-16 03:45:27

标签: python html-parsing beautifulsoup

我正在构建一个函数,用Python请求和BeautifulSoup从网页中提取图像源URL。大多数页面都运行良好,但当我在Facebook App页面上尝试时,BeautifulSoup根本找不到任何图像元素。

当我检查我的服务器回来的HTML时,我注意到Facebook页面将图像隐藏在DOM的注释部分 - 这是一个动态的HTML生成问题。

我的问题是,如何最好地提取一个完全形成的img标记字符串,该字符串位于HTML的注释部分,但实际上并不是DOM的一部分。这只是一个正则表达式的工作,还是我可以让BeautifulSoup弄清楚如何看待它?

Facebook应用示例:
http://www.facebook.com/cocacola/app_106795496113635

代码:

import requests
r = requests.get(url, allow_redirects=True)
html = r.text
soup = BeautifulSoup(html)

HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en" id="facebook" class="no_js">
    <head>
        <meta charset="utf-8">
        <script type="text/javascript">
function envFlush(a){function b(c){for(var d in a)c[d]=a[d];}if(window.requireLazy){requireLazy(['Env'],b);}else{Env=window.Env||{};b(Env);}} envFlush({"user":"0","locale":"en_US","method":"GET","svn_rev":524375,"vip":"69.171.234.48","static_base":"http:\/\/static.ak.fbcdn.net\/","www_base":"http:\/\/www.facebook.com\/","rep_lag":2,"fb_dtsg":"AQAe8p1e","ajaxpipe_token":"AXjpiPEj5XnbBS6r","lhsh":"hAQFiKNUl","tracking_domain":"http:\/\/pixel.facebook.com","retry_ajax_on_network_error":"1","html5_audio":"1","fbid_emoticons":"1"});
        </script>
        <script type="text/javascript">
envFlush({"eagleEyeConfig":{"seed":"2xfQ"}});CavalryLogger=false;window._script_path = "\/profile_page_timeline.php:app_{N}";window._incorporate_fragment = true;
        </script>
        <meta http-equiv="refresh" content="0; URL=/cocacola/app_106795496113635?_fb_noscript=1">
        <meta name="robots" content="noodp, noydir">
        <meta name="description" content=" Facebook is a social utility that connects people with friends and others who work, study and live around them. People use Facebook to keep up with friends, upload an unlimited number of photos, post links and videos, and learn more about the people they meet.">
        <link rel="alternate" media="handheld" href="http://www.facebook.com/cocacola/app_106795496113635">
        <title>
            Coca-Cola - Food/Beverages - Your Stories | Facebook
        </title>
        <meta name="title" content="Coca-Cola - Food/Beverages | Facebook">
        <link rel="shortcut icon" href="http://static.ak.fbcdn.net/rsrc.php/yi/r/q9U99v3_saj.ico">
        <meta http-equiv="X-Frame-Options" content="deny">
        <link type="text/css" rel="stylesheet" href="http://static.ak.fbcdn.net/rsrc.php/v1/yU/r/nJkT6_kk3B4.css">
        <link type="text/css" rel="stylesheet" href="http://static.ak.fbcdn.net/rsrc.php/v1/yB/r/iE5kZURGsmn.css">
        <link type="text/css" rel="stylesheet" href="http://static.ak.fbcdn.net/rsrc.php/v1/yq/r/-R1g7OGqrFd.css">
    </head>
    <body>
        <script type="text/javascript" src="http://static.ak.fbcdn.net/rsrc.php/v1/y5/r/lv-mu7kxrY8.js">
</script><script type="text/javascript">
window.Bootloader && Bootloader.done(["RlYvb"]);
        </script>
        <div id="FB_HiddenContainer" style="position:absolute; top:-10000px; width:0px; height:0px;"></div>
        <div id="pagelet_bluebar" data-referrer="pagelet_bluebar">
            <div id="blueBarHolder" class="loggedOut">
                <div id="blueBar">
                    <div class="loggedout_menubar_container">
                        <div class="clearfix loggedout_menubar">
                            <a class="lfloat" href="/" title="Go to Facebook Home"><i class="fb_logo img sp_6jxgq1 sx_df432d"><u>Facebook logo</u></i></a>
                            <div class="rfloat">
                                <div class="menu_login_container">
                                    <form id="login_form" action="https://www.facebook.com/login.php?login_attempt=1" method="post" onsubmit="return Event.__inlineSubmit(this,event)" name="login_form">
                                        <input type="hidden" autocomplete="off" name="post_form_id" value="eeb7846832efec96a2e64cba95741522"><input type="hidden" name="lsd" value="VsBjJ" autocomplete="off"><input type="hidden" autocomplete="off" id="locale" name="locale" value="en_US">
                                        <table cellspacing="0">
                                            <tr>
                                                <td class="html7magic">
                                                    <label for="email">Email</label>
                                                </td>
                                                <td class="html7magic">
                                                    <label for="pass">Password</label>
                                                </td>
                                            </tr>
                                            <tr>
                                                <td>
                                                    <input type="text" class="inputtext" name="email" id="email" value="" tabindex="1">
                                                </td>
                                                <td>
                                                    <input type="password" class="inputtext" name="pass" id="pass" tabindex="2">
                                                </td>
                                                <td>
                                                    <label class="uiButton uiButtonConfirm" id="loginbutton" for="uny39o_1"><input value="Log In" tabindex="4" type="submit" id="uny39o_1"></label>
                                                </td>
                                            </tr>
                                            <tr>
                                                <td class="login_form_label_field">
                                                    <div class="uiInputLabel">
                                                        <input id="persist_box" type="checkbox" name="persistent" value="1" tabindex="3" class="uiInputLabelCheckbox"><label for="persist_box">Keep me logged in</label>
                                                    </div><input type="hidden" name="default_persistent" value="0">
                                                </td>
                                                <td class="login_form_label_field">
                                                    <a href="http://www.facebook.com/recover.php" rel="nofollow">Forgot your password?</a>
                                                </td>
                                            </tr>
                                        </table><input type="hidden" autocomplete="off" id="next" name="next" value="http://www.facebook.com/cocacola/app_106795496113635"><input type="hidden" name="charset_test" value="€,´,€,´,水,Д,Є"><input type="hidden" autocomplete="off" id="lsd" name="lsd" value="VsBjJ"><input type="hidden" autocomplete="off" name="timezone" value="" id="uny39o_2"><input type="hidden" name="lgnrnd" value="202334_yY_m"><input type="hidden" id="lgnjs" name="lgnjs" value="n">
                                    </form>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        <div id="globalContainer" class="uiContextualLayerParent">
            <div id="content" class="fb_content clearfix">
                <div itemscope="itemscope" itemtype="http://data-vocabulary.org/person">
                    <div id="mainContainer">
                        <div id="leftColContainer">
                            <div id="leftCol"></div>
                        </div>
                        <div id="contentCol" class="clearfix hasRightCol">
                            <div id="rightCol" role="complementary">
                                <div id="rightColContent"></div>
                            </div>
                            <div id="contentArea" role="main">
                                <div id="pagelet_timeline_main_column" data-referrer="pagelet_timeline_main_column" data-gt="{&quot;profile_owner&quot;:&quot;40796308305&quot;,&quot;ref&quot;:&quot;timeline:app_106795496113635&quot;}"></div>
                            </div>
                            <div id="bottomContent"></div>
                        </div>
                    </div>
                </div>
            </div>
            <div id="pageFooter" data-referrer="page_footer">
                <div id="contentCurve"></div>
                <div class="clearfix" id="footerContainer">
                    <div class="mrl lfloat" role="contentinfo">
                        <div class="fsm fwn fcg">
                            <span>Facebook © 2012</span> · <a rel="dialog" href="/ajax/intl/language_dialog.php?uri=http%3A%2F%2Fwww.facebook.com%2Fcocacola%2Fapp_106795496113635" title="Use Facebook in another language.">English (US)</a>
                        </div>
                    </div>
                    <div class="navigation fsm fwn fcg" role="navigation">
                        <a href="http://www.facebook.com/mobile/?ref=pf" title="Check out Facebook Mobile.">Mobile</a> · <a href="http://www.facebook.com/find-friends?ref=pf" title="Find anyone on the web.">Find Friends</a> · <a href="http://www.facebook.com/badges/?ref=pf" title="Embed a Facebook badge on your website.">Badges</a> · <a href="http://www.facebook.com/directory/people/" title="Browse our people directory.">People</a> · <a href="http://www.facebook.com/directory/pages/" title="Browse our pages directory.">Pages</a> · <a href="http://www.facebook.com/facebook" accesskey="8" title="Read our blog, discover the resource center, and find job opportunities.">About</a> · <a href="http://www.facebook.com/campaign/landing.php?placement=pflo&amp;campaign_id=402047449186&amp;extra_1=auto" title="Advertise on Facebook.">Advertising</a> · <a href="http://www.facebook.com/pages/create.php?ref_type=sitefooter" title="Create a Page">Create a Page</a> · <a href="http://developers.facebook.com/?ref=pf" title="Develop on our platform.">Developers</a> · <a href="http://www.facebook.com/careers/?ref=pf" title="Make your next career move to our awesome company.">Careers</a> · <a href="http://www.facebook.com/privacy/explanation" title="Learn about your privacy and Facebook.">Privacy</a> · <a href="http://www.facebook.com/legal/terms?ref=pf" accesskey="9" title="Review our terms of service.">Terms</a> · <a href="http://www.facebook.com/help/?ref=pf" accesskey="0" title="Visit our Help Center.">Help</a>
                    </div>
                </div>
            </div>
        </div><script type="text/javascript">
/*<![CDATA[*/function si_cj(m){setTimeout(function(){new Image().src="http:\/\/error.facebook.com\/common\/scribe_endpoint.php?c=si_clickjacking&t=6215"+"&m="+m;},5000);}if(top!=self && !false){try{if(parent!=top){throw 1;}var si_cj_d=["apps.facebook.com","\/pages\/","apps.beta.facebook.com"];var href=top.location.href.toLowerCase();for(var i=0;i<si_cj_d.length;i++){if (href.indexOf(si_cj_d[i])>=0){throw 1;}}si_cj("3 ");}catch(e){si_cj("1 \t");window.document.write("\u003Cstyle>body * {display:none !important;}\u003C\/style>\u003Ca href=\"#\" onclick=\"top.location.href=window.location.href\" style=\"display:block !important;padding:10px\">\u003Ci class=\"img sp_46v94c sx_401f21\" style=\"display:block !important\">\u003C\/i>Go to Facebook.com\u003C\/a>");/*ypn0bXTr*/}}/*]]>*/
        </script><script type="text/javascript">
Bootloader.setResourceMap({"VhLvJ":{"type":"css","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yU\/r\/nJkT6_kk3B4.css"},"sbVQp":{"type":"css","permanent":1,"src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yB\/r\/iE5kZURGsmn.css"},"0NL5c":{"type":"css","permanent":1,"src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yq\/r\/-R1g7OGqrFd.css"},"VDymv":{"type":"css","permanent":1,"src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/y6\/r\/mA6ahNFI0KJ.css"}});Bootloader.setResourceMap({"Q6HMA":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/y6\/r\/-B_fBBv_220.js"},"y3kOn":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yz\/r\/WNADMmAL4i0.js"},"cNca2":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yh\/r\/8iHYobZ_uUW.js"},"xbu5O":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yf\/r\/MsjbZFUA3CA.js"},"IdMsN":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yG\/r\/NSXA8EZYqOA.js"},"oW\/FK":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yX\/r\/J7VcVoS5R35.js"},"KuxPB":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yM\/r\/1H5Y5NQHMnu.js"},"rZSx8":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yF\/r\/AMUiJJrPh_6.js"},"H42Jh":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/y3\/r\/ppwOo4BAmlb.js"},"Z5N10":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yq\/r\/LjiCoz6UKXG.js"}}); Bootloader.enableBootload({"ErrorSignal":{"resources":["Q6HMA","cNca2"],"module":true},"Dialog":{"resources":["Q6HMA","sbVQp"],"module":true},"json":{"resources":[],"module":true,"runWhenReady":true},"DOM":{"resources":["Q6HMA"],"module":true},"HTML":{"resources":["Q6HMA"],"module":true},"event-extensions":{"resources":["Q6HMA"],"module":true},"legacy:dialog":{"resources":["Q6HMA","sbVQp"]},"IframeShim":{"resources":["Q6HMA","xbu5O"],"module":true},"legacy:ajaxpipe":{"resources":["Q6HMA"]},"legacy:async":{"resources":["Q6HMA"]},"legacy:PhotoSnowlift":{"resources":["Q6HMA","sbVQp"]},"fb-photos-snowlift-css":{"resources":["sbVQp"]},"Live":{"resources":["Q6HMA","IdMsN","oW\/FK"],"module":true},"PhotoTagApproval":{"resources":["Q6HMA","KuxPB"],"module":true},"PhotoTagger":{"resources":["Q6HMA","sbVQp","KuxPB"],"module":true},"PhotoTags":{"resources":["Q6HMA","KuxPB"],"module":true},"PhotoViewerSubscribe":{"resources":["Q6HMA","rZSx8"],"module":true},"TagTokenizer":{"resources":["Q6HMA","KuxPB"],"module":true},"fb-photos-snowlift-fullscreen-css":{"resources":["VDymv"]},"VideoRotate":{"resources":["Q6HMA","H42Jh"],"module":true},"AsyncResponse":{"resources":["Q6HMA"],"module":true},"PhotoInlineEditor":{"resources":["Q6HMA","Z5N10"],"module":true},"Form":{"resources":["Q6HMA"],"module":true},"DOMScroll":{"resources":["Q6HMA"],"module":true},"legacy:Toggler":{"resources":["Q6HMA","sbVQp"]},"legacy:dom-form":{"resources":["Q6HMA"]},"legacy:Tooltip":{"resources":["Q6HMA","sbVQp"]},"Input":{"resources":["Q6HMA"],"module":true},"dimension-tracking":{"resources":["Q6HMA"]},"detect-broken-proxy-cache":{"resources":["Q6HMA"]},"link-rel-preload":{"resources":["Q6HMA"]}});Arbiter.registerCallback(InitialJSLoader.callback, ["BOOTLOAD\/ROADRUNNER_READY"]);Arbiter.registerCallback(function() {InitialJSLoader.load(["Q6HMA","y3kOn"]);Arbiter.inform("BOOTLOAD\/ROADRUNNER_READY", true, Arbiter.BEHAVIOR_STATE);}, [OnloadEvent.ONLOAD_DOMCONTENT_CALLBACK]);
        </script><script type="text/javascript">
Bootloader.configurePage(["VhLvJ","sbVQp","0NL5c"]); Bootloader.done(["jDr+c"]); JSCC.init(({"j0E2hENIaexzaBuzHe1":function(){return new AsyncLayout();}})); new (require("ServerJS"))().handle({"require":[["WidePageController"],["LoginFormController","init",[],[{"__e":"login_form","root":null},{"__e":"loginbutton","root":null}]],["TinyViewport"]]}); onloadRegister_DEPRECATED(function (){Arbiter.inform("UserAction\/loadSamplingRates", [{"ns":"timeline","ua_id":"scrubber","rate":100},{"ns":"test","ua_id":"test","rate":1},{"ns":"groups","ua_id":"create_dialog","rate":10}], Arbiter.BEHAVIOR_PERSISTENT)}); onloadRegister_DEPRECATED(function (){Arbiter.inform("UserAction/enable", ["events"], Arbiter.BEHAVIOR_PERSISTENT);}); onloadRegister_DEPRECATED(function (){JSCC.get('j0E2hENIaexzaBuzHe1').init("contentArea");}); onloadRegister_DEPRECATED(function (){window.intl_locale_rewrites = {"meta":{"\/_B\/":"^(.*[.,!?\\s]|)","\/_E\/":"([.,!?\\s].*|)$"},"patterns":{"\/\u0001(.*)('|&#039;)s\u0001(?:'|&#039;)s(.*)\/":"\u0001$1$2s\u0001$3","\/_\u0001([^\u0001]*)\u0001\/e":"mb_strtolower(\"\u0001$1\u0001\")","\/\\^\\x01([^\\x01])(?=[^\\x01]*\\x01)\/e":"mb_strtoupper(\"\u0001$1\")","\/_\u0001([^\u0001]*)\u0001\/":"javascript"}};}); onloadRegister_DEPRECATED(function (){$("uny39o_2").value = tz_calculate(1331868214)}); onloadRegister_DEPRECATED(function (){try { $("email").focus(); } catch (_ignore) { }}); onafterloadRegister_DEPRECATED(function (){Bootloader.loadComponents(["dimension-tracking"], function(){ });}); onafterloadRegister_DEPRECATED(function (){Bootloader.loadComponents(["detect-broken-proxy-cache"], function(){ detect_broken_proxy_cache("0", "c_user") });}); onafterloadRegister_DEPRECATED(function (){Bootloader.loadComponents(["link-rel-preload"], function(){ link_rel_preload() });}); 
        </script><script type="text/javascript">
var big_pipe = new BigPipe({"lid":0,"rrEnabled":1,"forceFinish":true,"delay":0,"jsEarlier":0});
        </script><script type="text/javascript">
big_pipe.onPageletArrive({"phase":0,"id":"first_response","is_last":true,"css":["VhLvJ","sbVQp","0NL5c"],"js":["Q6HMA","y3kOn"]});
        </script>
        <p>
            <code class="hidden_elem" id="uny39o_3"><!-- <div id="pagelet_main_column_personal" data-referrer="pagelet_main_column_personal_other"></div> --></code>
        </p><script type="text/javascript">
big_pipe.onPageletArrive({"phase":1,"id":"pagelet_timeline_main_column","resource_map":{"\/zUlm":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yo\/r\/U9HBCtRTNYV.js"}},"js":["Q6HMA","\/zUlm"],"onload":["TimelineController.init(\"40796308305\", \"app_106795496113635\");"],"content":{"pagelet_timeline_main_column":{"container_id":"uny39o_3"}}});
        </script>
        <p>
            <code class="hidden_elem" id="uny39o_7"><!-- <div class="fbTimelineTopSectionBase collapsedHead"><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection mtm fbTimelineTopSection"><div id="fbProfileCover"><div class="fbTimelineStickyHeader fixed_elem fbTimelineStickyHeaderVisible" id="uny39o_4"><div class="stickyHeaderWrap clearfix"><div class="back"></div><div class="name"><a class="profileThumb" href="http://www.facebook.com/coca-cola"><img class="uiProfilePhoto uiProfilePhotoLarge img" src="http://profile.ak.fbcdn.net/hprofile-ak-snc4/174560_40796308305_2093137831_q.jpg" alt="" /></a><span class="uiButtonGroup fbStickyHeaderBreadcrumb uiButtonGroupOverlay" id="uny39o_5"><span class="firstItem uiButtonGroupItem buttonItem"><a class="nameButton uiButton uiButtonOverlay" role="button" href="http://www.facebook.com/coca-cola"><span class="uiButtonText">Coca-Cola</span></a></span><span class="lastItem uiButtonGroupItem selectorItem"><div class="uiSelector inlineBlock pageMenu uiSelectorNormal uiSelectorDynamicLabel"><div class="wrap"><a class="pageMenuButton uiSelectorButton uiButton uiButtonOverlay" role="button" href="#" aria-haspopup="1" data-label="Your Stories" data-length="30" rel="toggle"><span class="uiButtonText">Your Stories</span></a><div class="uiSelectorMenuWrapper uiToggleFlyout"><div role="menu" class="uiMenu uiSelectorMenu"><ul class="uiMenuInner"><li class="uiMenuItem uiMenuItemRadio uiSelectorOption" data-label="Timeline"><a class="itemAnchor itemWithIcon" role="menuitemradio" tabindex="0" aria-checked="false" href="http://www.facebook.com/coca-cola"><i class="mrs itemIcon img sp_5y58i6 sx_df81cf"></i><span class="itemLabel fsm">Timeline</span></a></li><li class="uiMenuItem uiMenuItemRadio uiSelectorOption" data-label="About"><a class="itemAnchor itemWithIcon" role="menuitemradio" tabindex="-1" aria-checked="false" href="http://www.facebook.com/cocacola/info"><i class="mrs itemIcon img sp_46v94c sx_861aba"></i><span class="itemLabel fsm">About</span></a></li><li class="uiMenuSeparator separator hidden_elem"></li></ul></div></div></div><select><option value="">Your Stories</option><option value="Timeline">Timeline</option><option value="About">About</option></select></div></span><span class="uiButtonGroupItem selectorItem hidden_elem"><div class="uiSelector inlineBlock sectionMenu uiSelectorNormal uiSelectorDynamicLabel"><div class="wrap"><a class="uiSelectorButton uiButton uiButtonOverlay uiButtonNoText" role="button" href="#" aria-haspopup="1" data-length="30" rel="toggle"><span class="uiButtonText"></span></a><div class="uiSelectorMenuWrapper uiToggleFlyout"><div role="menu" class="uiMenu uiSelectorMenu"><ul class="uiMenuInner"><li class="uiMenuItem uiMenuItemRadio uiSelectorOption"><a class="itemAnchor" role="menuitemradio" tabindex="0" aria-checked="false" href="#" rel="ignore"><span class="itemLabel fsm"><span></span></span></a></li></ul></div></div></div><select><option value=""></option><option value=""></option></select></div></span><span class="uiButtonGroupItem selectorItem hidden_elem"><div class="uiSelector inlineBlock subsectionMenu uiSelectorNormal uiSelectorDynamicLabel"><div class="wrap"><a class="uiSelectorButton uiButton uiButtonOverlay" role="button" href="#" aria-haspopup="1" data-length="30" rel="toggle"><span class="uiButtonText">Highlights</span></a><div class="uiSelectorMenuWrapper uiToggleFlyout"><div role="menu" class="uiMenu uiSelectorMenu"><ul class="uiMenuInner"><li class="uiMenuItem uiMenuItemRadio uiSelectorOption highlights checked" data-label="Highlights"><a class="itemAnchor" role="menuitemradio" tabindex="0" aria-checked="true" href="#" rel="ignore"><span class="itemLabel fsm">Highlights</span></a></li><li class="uiMenuItem uiMenuItemRadio uiSelectorOption allStories" data-label="All Stories"><a class="itemAnchor" role="menuitemradio" tabindex="-1" aria-checked="false" href="#" rel="ignore"><span class="itemLabel fsm">All Stories</span></a></li><li class="uiMenuSeparator separator hidden_elem"></li></ul></div></div></div><select><option value=""></option><option value="highlights" selected="1">Highlights</option><option value="allStories">All Stories</option></select></div></span></span></div><div class="actions"><span class="uiButtonGroup fbTimelineConnectButtonGroup uiButtonGroupOverlay" id="uny39o_6"><span class="firstItem lastItem uiButtonGroupItem buttonItem"><a class="uiButton uiButtonOverlay uiButtonLarge" role="button" rel="dialog" href="/ajax/signup_dialog.php?page_id=40796308305&amp;next=http%3A%2F%2Fwww.facebook.com%2Fcoca-cola"><i class="mrs img sp_aoiw5d sx_b3340c"></i><span class="uiButtonText">Like</span></a></span></span></div></div></div></div><div id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div></div></div><div id="timeline_tab_content"><div class="fbTimelineSection mtm pageAppTab"><div id="pagelet_app_106795496113635" data-referrer="pagelet_app_106795496113635"></div></div></div> --></code>
        </p><script type="text/javascript">
big_pipe.onPageletArrive({"phase":1,"id":"pagelet_main_column_personal","display_dependency":["pagelet_timeline_main_column"],"jsmods":{"require":[["Selector"]]},"css":["VhLvJ","sbVQp","noaQ6"],"resource_map":{"noaQ6":{"type":"css","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yZ\/r\/olLHMD8tyzn.css"}},"js":["Q6HMA","\/zUlm"],"onload":["TimelineStickyHeader.init($('uny39o_4'))","TimelineStickyHeaderNav.init($('uny39o_5'), {\"custom_subsection_menu\":true})"],"content":{"pagelet_main_column_personal":{"container_id":"uny39o_7"}}});
        </script>
        <p>
            <code class="hidden_elem" id="uny39o_8"><!-- <div class="timeline"></div> --></code>
        </p><script type="text/javascript">
big_pipe.onPageletArrive({"phase":1,"id":"pagelet_above_header_timeline","display_dependency":["pagelet_main_column_personal"],"is_last":true,"js":["Q6HMA"],"content":{"pagelet_above_header_timeline":{"container_id":"uny39o_8"}}});
        </script>
        <p>
            <code class="hidden_elem" id="uny39o_9"><!-- <div><div id="pagelet_app_runner" data-referrer="pagelet_app_runner"></div></div> --></code>
        </p><script type="text/javascript">
big_pipe.onPageletArrive({"phase":2,"id":"pagelet_app_106795496113635","is_last":true,"has_inline_js":true,"css":["a9lBH"],"bootloadable":{"legacy:dom":{"resources":["Q6HMA"]}},"resource_map":{"a9lBH":{"type":"css","permanent":1,"src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yE\/r\/I3KZ7jvU7mg.css"}},"js":["Q6HMA"],"content":{"pagelet_app_106795496113635":{"container_id":"uny39o_9"}},"tti_phase":2});
        </script>
        <p>
            <code class="hidden_elem" id="uny39o_10"><!-- <iframe name="app_runner_4f62b2364c9a97c52840233" id="app_runner_4f62b2364c9a97c52840233" style="width:810px;height:800px;" frameborder="0" src="http://static.ak.facebook.com/platform/page_proxy.php?v=4#app_runner_4f62b2364c9a97c52840233"></iframe> --></code>
        </p><script type="text/javascript">
big_pipe.onPageletArrive({"phase":3,"id":"pagelet_app_runner","is_last":true,"is_second_to_last_phase":true,"has_inline_js":true,"css":["sbVQp"],"resource_map":{"4oS\/B":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/y7\/r\/G6XO6uDjJ6r.js"},"OhsqH":{"type":"js","src":"http:\/\/static.ak.fbcdn.net\/rsrc.php\/v1\/yT\/r\/MIrvOITsbj6.js"}},"js":["Q6HMA","4oS\/B","OhsqH"],"onload":["PlatformAppController.init({\"name\":\"app_runner_4f62b2364c9a97c52840233\",\"config\":[],\"appTabUrl\":\"http:\\\/\\\/assets.facebook.coca-cola.com\\\/contentstore\\\/globaltab\\\/facebook\\\/tabs\\\/attribution\\\/\",\"signedRequest\":\"-fDEufyKDWg38NUBKHDoj0nQHnew1Wh92G75b89OXfg.eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImlzc3VlZF9hdCI6MTMzMTg2ODIxNCwicGFnZSI6eyJpZCI6IjQwNzk2MzA4MzA1IiwibGlrZWQiOmZhbHNlLCJhZG1pbiI6ZmFsc2V9LCJ1c2VyIjp7ImNvdW50cnkiOiJ1cyIsImxvY2FsZSI6ImVuX1VTIiwiYWdlIjp7Im1pbiI6MCwibWF4IjoxMn19fQ\"})"],"content":{"pagelet_app_runner":{"container_id":"uny39o_10"}}});
        </script><script type="text/javascript">
big_pipe.onPageletArrive({"phase":4,"id":"","is_last":true,"the_end":true,"css":["VhLvJ","sbVQp","0NL5c"],"js":["Q6HMA","y3kOn"]});
        </script>
    </body>
</html>

1 个答案:

答案 0 :(得分:3)

第二个片段看起来像是动态生成的最终HTML。要验证,请将通过HTTP获得的内容与Firebug显示为最终DOM的内容进行比较。您有几种选择:

  1. 对Javascript代码进行反向工程并编写Python代码以模拟其行为
  2. 使用真正的DOM感知浏览器环境,如Selenium或我自己的库,dryscrape,它使用QtWebkit,更轻巧,更快(但仅在Linux上测试过)。