抓取,如何更改输入表单中的值,提交然后抓取页面

时间:2019-09-05 15:19:27

标签: web-scraping scrapy scrapy-splash

我想在文本输入字段中输入一个值,然后提交表单,然后在表单提交后在页面上刮取新数据 这怎么可能?

这是页面上的html表单。我想将输入值从10更改为100并提交表单

<form action="https://de.iss.fst.com/ba-u6-72-nbr-902-112-x-140-x-13-12-mm-simmerringr-ba-a-mit-feder-fst-40411416#product-offers-anchor" method="post" _lpchecked="1">
            <div class="fieldset">
               <div class="field qty">
                  <div class="control">
                        <label class="label" for="qty-2">
                           <span>Preise für</span>
                        </label>
                        <input type="text" name="pieces" class="validate-length maximum-length-10 qty" maxlength="12" id="qty-2" value="10">
                        <label class="label" for="qty-2">
                           <span>Teile</span>
                        </label>
                        <span class="actions">
                           <button type="submit" title="Absenden" class="action">
                              <span>Absenden</span>
                           </button>
                        </span>
                  </div>
               </div>
            </div>
      </form>

更新! 新的工作代码。

import scrapy
import pymongo
from scrapy_splash import SplashRequest, SplashFormRequest
from issfst.items import IssfstItem


class IssSpider(scrapy.Spider):
    name = "issfst_spider"
    start_urls = ["https://de.iss.fst.com/dichtungen/radialwellendichtringe/rwdr-mit-geschlossenem-kafig/ba"]
    custom_settings = {
        # specifies exported fields and order
        'FEED_EXPORT_FIELDS': ["imgurl",
                               "Produktdatenblatt",
                               "Materialdatenblatt",]
    }

    def parse(self, response):
        self.log("I just visted:" + response.url)
        urls = response.css('.details-button > a::attr(href)').extract()

        for url in urls:
            formdata = {'pieces': '200'}
            yield SplashFormRequest.from_response(
                response,
                url=url,
                formdata=formdata,
                callback=self.parse_details,
                args={'wait': 3}
            )

        # follow paignation link
        next_page_url = response.css('li.item  > a.next::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)

    def parse_details(self, response):
        item = IssfstItem()
        # scrape image url
        item['imgurl'] = response.css('img.fotorama__img::attr(src)').extract(),
        # scrape download pdf links
        item['Produktdatenblatt'] = response.css('a.action[data-group="productdatasheet"]::attr(href)').extract_first(),
        item['Materialdatenblatt'] = response.css( 'a.action[data-group="materialdatasheet"]::attr(href)').extract_first(),
        item['Beschreibung'] = response.css('.description > p::text').extract_first(),
        yield item

1 个答案:

答案 0 :(得分:1)

您不应该参考html源代码来了解POST请求的参数名称。您应该使用自己喜欢的浏览器的开发人员工具并在保存日志的同时查看网络。

因此,您正在寻找URL https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor,并使用参数piecesform_key进行POST。

Firefox's developer tool (FR version) to see POST request and its parameters

在网站期望使用名称'value'时,如果使用错误的名称'pieces'设置表单数据,则会出错。

现在,作为scrapy shell会话中的演示:

scrapy shell "https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424"
... 
from scrapy import FormRequest

##SETTING POST'S PARAMETERS
form_key = response.css('[name="form_key"]::attr(value)').get()
#Note response.xpath('input[@name="form_key"]/@value') returns nothing
#as far as I know for hidden element like this, css selection is the basic solution
pieces = "100"
form_data = {'form_key':form_key,'pieces':pieces} #with the correct names

##POST THE REQUEST
fetch(
     FormRequest(
    'https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor',
    formdata=form_data)
)#note the add of '#product-offers-anchor' to the url, instead it won't work
view(response) #to see the page your default browser

现在您可以使以上内容适合您的代码。