如何提取亚马逊评论?

时间:2018-09-29 07:25:17

标签: web-scraping scrapy web-crawler amazon python-3.7

我想提取亚马逊评论及其所有相关数据,例如:评论者的姓名,评分,内容和(如果可能)评论以回应该评论。 我正在使用python 3.7。

1 个答案:

答案 0 :(得分:0)

您可以通过两种方法实现这一目标:

  1. API(快速可靠)

向Amazon咨询API

  1. 无头铬或硒之类的工具 check this post

在页面中查找DOM元素,如

<div id="R1IZDPP09RA69A" data-hook="review" class="a-section review">
    <div id="customer_review-R1IZDPP09RA69A" class="a-section celwidget" data-cel-widget="customer_review-R1IZDPP09RA69A">
        <div class="a-row a-spacing-mini">
            <a href="/gp/profile/amzn1.account.AGFB356ZQJQAHWZZABTTNIFGYMDA/ref=cm_cr_dp_d_gw_tr?ie=UTF8" class="a-profile" data-a-size="small">
                <div aria-hidden="true" class="a-profile-avatar-wrapper">
                    <div class="a-profile-avatar">
                        <img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg" class="" data-src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg">
                        <noscript><img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg"></noscript>
                    </div>
                </div>
                <div class="a-profile-content"><span class="a-profile-name">Sana Khanam</span></div>
            </a>
        </div>
        <div class="a-row"><a class="a-link-normal" title="4.0 out of 5 stars" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&amp;ASIN=B079Q9VNWQ"><i data-hook="review-star-rating" class="a-icon a-icon-star a-star-4 review-rating"><span class="a-icon-alt">4.0 out of 5 stars</span></i></a><span class="a-letter-space"></span><a data-hook="review-title" class="a-size-base a-link-normal review-title a-color-base a-text-bold" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&amp;ASIN=B079Q9VNWQ">Amazing service by VOLTAS !!Appreciated </a></div>
        <span data-hook="review-date" class="a-size-base a-color-secondary review-date">17 May 2018</span>
        <div class="a-row a-spacing-mini review-data review-format-strip"><span data-hook="avp-badge-linkless" class="a-size-mini a-color-state a-text-bold">Verified Purchase</span></div>
        <div class="a-row a-spacing-small review-data">
            <span data-hook="review-body" class="a-size-base review-text">
                <div aria-live="polite" data-a-expander-name="review_text_read_more" data-a-expander-collapsed-height="300" class="a-expander-collapsed-height a-row a-expander-container a-expander-partial-collapse-container" style="max-height: none; height: 300px;">
                    <div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after having AC installed.<br>PROS:<br><br>-Cooling nice<br>-Looks good<br>-Good one in this price range<br>-Fast Installation within 24h<br>- Customer support response appriciated<br><br>CONS<br>-Started making weird little noises while decreasing temperature.<br><br>-I don't understand but some unpleasant smell being diffused after starting it at the 5th day of installation.<br><br>-Contacted seller, issue resolved!<br><br>Overall would RECOMMEND!! Go for  it.</div>
                    <div class="a-expander-header a-expander-partial-collapse-header" style="opacity: 1; display: block;">
                        <div class="a-expander-content-fade"></div>
                        <a href="javascript:void(0)" data-hook="expand-collapse-read-more-less" data-action="a-expander-toggle" class="a-declarative" data-a-expander-toggle="{&quot;allowLinkDefault&quot;:true, &quot;expand_prompt&quot;:&quot;Read more&quot;, &quot;collapse_prompt&quot;:&quot;Read less&quot;}"><i class="a-icon a-icon-extender-expand"></i><span class="a-expander-prompt">Read more</span></a>
                    </div>
                </div>
            </span>
        </div>
        <div data-hook="review-comments" class="a-row review-comments">
            <span class="cr-vote" data-hook="review-voting-widget">
                <div class="a-row a-spacing-small"><span data-hook="helpful-vote-statement" class="a-size-base a-color-tertiary cr-vote-text">5 people found this helpful</span></div>
                <div class="cr-helpful-button aok-float-left">
                    <span class="a-button a-button-base" id="a-autoid-14">
                        <span class="a-button-inner">
                            <a href="https://www.amazon.in/ap/signin?openid.return_to=https%3A%2F%2Fwww.amazon.in%2Fdp%2FB079Q9VNWQ%2Fref%3Dcm_cr_dp_d_vote_lft%3Fie%3DUTF8%26voteInstanceId%3DR1IZDPP09RA69A%26voteValue%3D1%26csrfT%3DgiYnW3e%252Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%252F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%252F%252F%252F%252F%23R1IZDPP09RA69A&amp;openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&amp;openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&amp;openid.assoc_handle=inflex&amp;openid.mode=checkid_setup&amp;openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0" data-hook="vote-helpful-button" class="a-button-text" role="button" id="a-autoid-14-announce">
                                <div class="cr-helpful-text">
                                    Helpful
                                </div>
                            </a>
                        </span>
                    </span>
                </div>
            </span>
            <i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><a data-hook="review-comment" class="a-size-base a-link-normal a-color-secondary a-text-normal" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_btm?ie=UTF8&amp;ASIN=B079Q9VNWQ#wasThisHelpful">Comment</a><span class="cr-footer-line-height">
            <span><i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><span class="a-declarative" data-action="cr-popup" data-cr-popup="{&quot;width&quot;:&quot;580&quot;,&quot;title&quot;:&quot;ReportAbuse&quot;,&quot;url&quot;:&quot;/hz/reviews-render/report-abuse?ie=UTF8&amp;voteDomain=Reviews&amp;ref=cm_cr_dp_d_rvw_hlp&amp;csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&amp;entityId=R1IZDPP09RA69A&amp;sessionId=257-1905223-9805712&quot;,&quot;height&quot;:&quot;380&quot;}"><a class="a-size-base a-link-normal a-color-secondary report-abuse-link a-text-normal" href="/hz/reviews-render/report-abuse?ie=UTF8&amp;voteDomain=Reviews&amp;ref=cm_cr_dp_d_rvw_hlp&amp;csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&amp;entityId=R1IZDPP09RA69A&amp;sessionId=257-1905223-9805712">Report abuse</a></span></span></span>
        </div>
    </div>
</div>

在这里您可以找到评论者姓名等级,并在每个具有属性div的{​​{1}}下进行评论。

data-hook="review"下的名称

评级<span class="a-profile-name">Sana Khanam</span>

查看<span class="a-icon-alt">4.0 out of 5 stars</span>