如何检查Scrapy中是否存在特定按钮?

时间:2014-07-10 08:43:56

标签: scrapy scrapy-spider

我在网页上有一个按钮

 <input class="nextbutton" type="submit" name="B1" value="Next 20>>"></input> 

现在我想检查页面上是否存在此按钮或不使用Xpath选择器,这样如果它存在,我可以转到下一页并从那里检索信息。

3 个答案:

答案 0 :(得分:4)

首先,你必须确定什么算作&#34;这个按钮&#34;。鉴于上下文,我建议寻找一个带有&#39; nextbutton&#39;类的输入。您可以在XPath中检查只有一个类的元素:

//input[@class='nextbutton']

但这只是寻找完全匹配。所以你可以试试:

//input[contains(@class, 'nextbutton')]

虽然这也符合&#34; nonextbutton&#34;或&#34; nextbuttonbig&#34;。所以你的最终答案可能是:

//input[contains(concat(' ', @class, ' '), ' nextbutton ')]

在Scrapy中,a Selector will evaluate as true if it matches some nonzero content。所以你应该能够写出类似的内容:

from scrapy.selector import Selector
input_tag = Selector(text=html_content).xpath("//input[contains(concat(' ', @class, ' '), ' nextbutton ')]")
if input_tag: 
    print "Yes, I found a 'next' button on the page."

答案 1 :(得分:0)

http://www.trumed.org/patients-visitors/find-a-doctor使用iframe

加载src="http://verify.tmcmed.org/iDirectory/"
<iframe border="0" frameborder="0" id="I1" name="I1"
        src="http://verify.tmcmed.org/iDirectory/"
        style="width: 920px; height: 600px;" target="I1">
Your browser does not support inline frames or is currently configured not to display inline frames.
</iframe>

搜索表单位于此iframe中。

这是一个scrapy shell会话,说明了这一点:

$ scrapy shell "http://www.trumed.org/patients-visitors/find-a-doctor"
2014-07-10 11:31:05+0200 [scrapy] INFO: Scrapy 0.24.2 started (bot: scrapybot)
2014-07-10 11:31:07+0200 [default] DEBUG: Crawled (200) <GET http://www.trumed.org/patients-visitors/find-a-doctor> (referer: None)
...

In [1]: response.xpath('//iframe/@src').extract()
Out[1]: [u'http://verify.tmcmed.org/iDirectory/']

In [2]: fetch('http://verify.tmcmed.org/iDirectory/')
2014-07-10 11:31:34+0200 [default] DEBUG: Redirecting (302) to <GET http://verify.tmcmed.org/iDirectory/applicationspecific/intropage.asp> from <GET http://verify.tmcmed.org/iDirectory/>
2014-07-10 11:31:35+0200 [default] DEBUG: Redirecting (302) to <GET http://verify.tmcmed.org/iDirectory/applicationspecific/search.asp> from <GET http://verify.tmcmed.org/iDirectory/applicationspecific/intropage.asp>
2014-07-10 11:31:36+0200 [default] DEBUG: Crawled (200) <GET http://verify.tmcmed.org/iDirectory/applicationspecific/search.asp> (referer: None)
...

In [3]: from scrapy.http import FormRequest

In [4]: frq = FormRequest.from_response(response, formdata={'LastName': 'c'})

In [5]: fetch(frq)
2014-07-10 11:32:15+0200 [default] DEBUG: Redirecting (302) to <GET http://verify.tmcmed.org/iDirectory/applicationspecific/SearchStart.asp> from <POST http://verify.tmcmed.org/iDirectory/applicationspecific/search.asp>
2014-07-10 11:32:15+0200 [default] DEBUG: Redirecting (302) to <GET http://verify.tmcmed.org/iDirectory/applicationspecific/searchresults.asp> from <GET http://verify.tmcmed.org/iDirectory/applicationspecific/SearchStart.asp>
2014-07-10 11:32:17+0200 [default] DEBUG: Crawled (200) <GET http://verify.tmcmed.org/iDirectory/applicationspecific/searchresults.asp> (referer: None)
...

In [6]: response.css('input.nextbutton')
Out[6]: [<Selector xpath=u"descendant-or-self::input[@class and contains(concat(' ', normalize-space(@class), ' '), ' nextbutton ')]" data=u'<input type="submit" value=" Next 20 &gt'>]

In [7]: response.xpath('//input[@class="nextbutton"]')
Out[7]: [<Selector xpath='//input[@class="nextbutton"]' data=u'<input type="submit" value=" Next 20 &gt'>]

In [8]: 

答案 2 :(得分:0)

由于Scrapy Selectors documentation的存在,您可以使用xpath和element属性来检查元素是否存在。

尝试一下!

isExists = response.xpath("//input[@class='nextbutton']").extract_first(default='not-found')
if( isExists == 'not-found'):
     # input Not Exists
     pass 
else:
     # input Exists , crawl other page
     pass