我一直在BS4中构建一个网络刮刀并且卡住了。我正在使用Trip Advisor作为我将要追踪的其他数据的测试,但我无法隔离“整个”评论的标签。这是一个例子:
https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html
请注意,在第一次审核中,“酒单是......”下方有一个图标。我能够轻松地隔离部分评论,但是在模拟“更多”点击后,无法找到让BS4拉取评论的方法。我想弄清楚需要哪些工具?我需要使用硒吗?
原始元素如下所示:
<span class="partnerRvw">
<span class="taLnk hvrIE6 tr475091998 moreLink ulBlueLinks" onclick=" ta.util.cookie.setPIDCookie(4444); ta.call('ta.servlet.Reviews.expandReviews', {type: 'dummy'}, ta.id('review_475091998'), 'review_475091998', '1', 4444);
">
More </span>
<span class="ui_icon caret-down"></span>
</span>
点击“更多”链接后查看HTML,您会发现一个新的动态添加的类,其中包含我需要的信息(见下文):
<div class="review dyn_full_review inlineReviewUpdate provider0 first newFlag" style="display: block;">
<a name="UR475091998" class=""></a>
<div id="UR475091998" class="extended provider0 first newFlag">
<div class="col1of2">
<div class="member_info">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-SRC_475091998" class="memberOverlayLink" onmouseover="requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'user_name_photo');" data-anchorwidth="90">
<div class="avatar profile_6875524F623CC948F4F9CA95BB4A9567 ">
<a onclick="">
<img src="https://media-cdn.tripadvisor.com/media/photo-l/0d/97/43/bf/joannecarpenter.jpg" class="avatar potentialFacebookAvatar avatarGUID:6875524F623CC948F4F9CA95BB4A9567" width="74" height="74">
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname mbrName_6875524F623CC948F4F9CA95BB4A9567" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">joannecarpenter</span>
</div>
</div>
<div class="location">
Humble, Texas
</div>
</div>
<div class="memberBadging g10n">
<div id="UID_6875524F623CC948F4F9CA95BB4A9567-CONT" class="no_cpu" onclick="ta.util.cookie.setPIDCookie('15984'); requireCallIfReady('members/memberOverlay', 'initMemberOverlay', event, this, this.id, 'Reviews', 'review_count');" data-anchorwidth="90">
<div class="levelBadge badge lvl_02">
Level <span><img src="https://static.tacdn.com/img2/badges/20px/lvl_02.png" alt="" class="icon" width="20" height="20/"></span> Contributor </div>
<div class="reviewerBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/rev_03.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 reviews</span> </div>
<div class="contributionReviewBadge badge">
<img src="https://static.tacdn.com/img2/badges/20px/Foodie.png" alt="" class="icon" width="20" height="20">
<span class="badgeText">6 restaurant reviews</span>
</div>
</div>
</div>
</div>
<div class="col2of2">
<div class="innerBubble">
<div class="quote"><a href="/ShowUserReviews-g56010-d470148-r475091998-Chez_Nous-Humble_Texas.html#CHECK_RATES_CONT" onclick="ta.setEvtCookie('Reviews','title','',0,this.href); setPID();" id="r475091998">“<span class="noQuotes">Dinner</span>”</a></div>
<div class="rating reviewItemInline">
<span class="rate sprite-rating_s rating_s"> <img class="sprite-rating_s_fill rating_s_fill s50" width="70" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="ratingDate relativeDate" title="April 12, 2017">Reviewed 3 days ago
<span class="new redesigned">NEW</span> </span>
<a class="viaMobile" href="/apps" target="_blank" onclick="ta.util.cookie.setPIDCookie(24687)">
<span class="ui_icon mobile-phone"></span>
via mobile
</a>
</div>
<div class="entry">
<p>
Our favorite restaurant in Houston. Definitely the best and friendliest service! The food is not only served with a flair, it is absolutely delicious. My favorite is the Lamb. It is the best! Also the duck moose, fois gras, the crispy salad and the French onion soup are all spectacular! This is a must try restaurant! The wine list is fantastic. Just ask Daniel for suggestions. He not only knows his wines; he loves what he does! We Love this place!
</p>
</div>
<div class="rating-list">
<div class="recommend">
<span class="recommend-titleInline noRatings">Visited April 2017</span>
</div>
</div>
<div class="expanded lessLink">
<span class="taLnk collapse ulBlueLinks no_cpu ">
Less
</span>
<span class="textArrow_more ui_icon caret-up"></span>
</div>
<div id="helpfulq475091998_expanded" class="helpful redesigned white_btn_container ">
<span class="isHelpful">Helpful?</span> <div class="tgt_helpfulq475091998 rnd_white_thank_btn" onclick="ta.call('ta.servlet.Reviews.helpfulVoteHandlerOb', event, this, 'LeJIVqd4EVIpECri1GII2t6mbqgqguuuxizSxiniaqgeVtIJpEJCIQQoqnQQeVsSVuqHyo3KUKqHMdkKUdvqHxfqHfGVzCQQoqnQQZiptqH5paHcVQQoqnQQrVxEJtxiGIac6XoXmqoTpcdkoKAUAAv0tEn1dkoKAUAAv0zH1o3KUK0pSM13vkooXdqn3XmffAdvqndqnAfbAo77dbAo3k0npEEeJIV1K0EJIVqiJcpV1U0Ii9VC1rZlU3XozxbZZxE2crHN2TDUJiqnkiuzsVEOxdkXqi7TxXpUgyR2xXvOfROwaqILkrzz9MvzCxMva7xEkq8xXNq8ymxbAq8AzzrhhzCxbx2vdNvEn2fnwEfq8alzCeqi53ZrgnMrHhshTtowGpNSmq89IwiVb7crUJxdevaCnJEqI33qiE5JGErJExXKx5ooItGCy5wnCTx2VA7RvxEsO3'); ta.trackEventOnPage('HELPFUL_VOTE_TEST', 'helpfulvotegiven_v2');">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_white.png" class="helpful_thumbs_up white">
<img src="https://static.tacdn.com/img2/icons/icon_thumb_green.png" class="helpful_thumbs_up green">
<span class="helpful_text">Thank joannecarpenter</span> </div>
</div>
<div class="tooltips vertically_centered">
<div class="reportProblem">
<span id="ReportIAP_475091998" class="problem collapsed taLnk" onclick="ta.trackEventOnPage('Report_IAP', 'Report_Button_Clicked', 'member'); ta.call('ta.servlet.Reviews.iapFlyout', event, this, '475091998')" onmouseover="if (!this.getAttribute('data-first')) {ta.trackEventOnPage('Reviews', 'report_problem', 'hover_over_flag'); this.setAttribute('data-first', 1)} uiOverlay(event, this)" data-tooltip="" data-position="above" data-content="Problem with this review?">
<img src="https://static.tacdn.com/img2/icons/gray_flag.png" width="13" height="14" alt="">
<span class="reportTxt">Report</span> </span>
</div>
</div>
<div class="userLinks">
<div class="sameGeoActivity">
<a href="/members-citypage/joannecarpenter/g56010" target="_blank" onclick="ta.setEvtCookie('Reviews','more_reviews_by_user','',0,this.href); ta.util.cookie.setPIDCookie(19160)">
See all 5 reviews by joannecarpenter for Humble </a>
</div>
<div class="askQuestion">
<span class="taLnk ulBlueLinks" onclick="ta.trackEventOnPage('answers_review','ask_user_intercept_click' ); ta.load('ta-answers', (function() {require('answers/misc').askReviewerIntercept(this, '470148', 'joannecarpenter', '6875524F623CC948F4F9CA95BB4A9567', 'en', '475091998','Chez Nous', 39151)}).bind(this), true);">Ask joannecarpenter about Chez Nous</span>
</div>
</div>
<div class="note">
This review is the subjective opinion of a TripAdvisor member and not of TripAdvisor LLC. </div>
<div class="duplicateReviewsInline">
<div class="previous">joannecarpenter has 1 more review of Chez Nous</div> <ul class="dupReviews">
<li class="dupReviewItem">
<div class="reviewTitle">
<a href="/ShowUserReviews-g56010-d470148-r453237869-Chez_Nous-Humble_Texas.html#REVIEWS">“Joanne Carpenter”</a>
</div>
<div class="rating">
<span class="rate sprite-rating_ss rating_ss"> <img class="sprite-rating_ss_fill rating_ss_fill ss50" width="50" src="https://static.tacdn.com/img2/x.gif" alt="5 of 5 bubbles">
</span>
<span class="date">Reviewed January 18, 2017</span>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="large">
</div>
<div class="ad iab_inlineBanner">
<div id="gpt-ad-468x60" class="adInner gptAd"></div>
</div>
</div>
BS4有办法为我处理这个问题吗?
答案 0 :(得分:1)
这是一个让你入门的简单例子:
import selenium
from selenium import webdriver
driver = webdriver.PhantomJS()
url = "https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html"
driver.get(url)
elem = driver.get_element_by_class_name("taLnk")
...
您可以在此处找到有关这些方法的更多信息: http://selenium-python.readthedocs.io/
答案 1 :(得分:0)
您很可能需要检查其中一些页面,以识别HTML代码中的变体。对于您提供的样本,并且您可以通过模拟印刷机获得它,以下代码可用于选择您想要的段落。
padding
结果:
我们在休斯顿最喜欢的餐厅。绝对是最好,最友善的服务!食物不仅具有天赋,还绝对美味。我最喜欢的是羔羊。这是最好的!鸭麋,鹅肝,脆皮沙拉和法式洋葱汤都很棒!这是一个必须尝试的餐厅!酒单太棒了。只要问丹尼尔的建议。他不仅了解他的葡萄酒;他喜欢他的所作所为!我们喜欢这个地方!
请注意,段落之前和之后都有换行符。