Question

我想从这个网站上刮掉星级费率： http://www.pixel.ir/canon/3071-canon-eos-700d-18-55-is-stm.html

我可以抓取文字，但是星标率显示为图像。它的源星利率由无线电提供：

<div class="gsrReviewLineRating" id="gsrDisplayRating1"> <input class="star" type="radio"id="gsrRating1" name="gsrRating1" value="1" /><input class="star" type="radio" id="gsrRating1" name="gsrRating1" value="2" /><input class="star" type="radio" id="gsrRating1" name="gsrRating1" value="3" /><input class="star" type="radio" id="gsrRating1" name="gsrRating1" value="4" /><input class="star" type="radio" id="gsrRating1" name="gsrRating1" value="5" **checked**="checked"/></div>

我试图用mozenda刮它，但我不能，有没有办法刮掉它？是否有其他软件用于此目的？

Answer 1

你可以很容易地使用Mozenda来做到这一点。如果您将下面的XML复制并粘贴到代理中，它将创建捕获星级的操作。要了解如何从多个页面捕获它，您可以访问此处 - http://mozenda.com/help/helptopic?TopicID=149

<!--- - - - - - - - Actions - - - - - - - - -->
<ActionList>
  <Action>
    <ActionType>GetElementValue</ActionType>
    <Page>1</Page>
    <FieldExpression>value=&amp;quot;%Star Rating%&amp;quot;</FieldExpression>
    <FieldExpression>value='%Star Rating%'</FieldExpression>
    <FieldExpression>value=%Star Rating% </FieldExpression>
    <FieldExpression>value=%Star Rating%&amp;gt;</FieldExpression>
    <ItemType>PlaceHolder</ItemType>
    <ItemXPath>//input[@name=&amp;quot;gsrAverageRating&amp;quot;][5]</ItemXPath>
    <ID>gsrAverageRating</ID>
    <Name>gsrAverageRating</Name>
    <FieldValueType>Outer</FieldValueType>
  </Action>
</ActionList>

Answer 2

您必须从value checked抓取input。

<input ... value="5" **checked**="checked"/>

我使用python和requests，lxml模块：

import requests
import lxml, lxml.html

r = requests.get('http://www.pixel.ir/canon/3071-canon-eos-700d-18-55-is-stm.html')

html = lxml.html.fromstring(r.text)

checked = html.cssselect('input#gsrAverageRating[checked]')

if checked:
    print checked[0].attrib.get('value', None)
else:
    print "No stars"

使用mozenda的网站废品率

2 个答案: