爬网时迭代返回相同的结果

时间:2020-02-07 10:11:43

标签: scrapy iteration

我是Scrapy的新手,正在浏览手册。我正在做一些练习,并坚持使用这些问题。遍历书籍列表时,尽管实际上页面中有20个不同的元素,但结果在迭代后总是返回相同的“键:值”对。

这是我的代码:

import React from "react";

import { Button, Form, Input, Message, Table } from "semantic-ui-react";
import "./styles.css";
import "semantic-ui-css/semantic.min.css";

export default class App extends React.Component {
  constructor() {
    super();
    this.state = {
      inputFocus: null,
      lista: [
        {
          key: 1,
          model: "11111111",
          sn: "TERR5RRTR555465",
          fv: "FV/12344/2019"
        },
        {
          key: 2,
          model: "2222222",
          sn: "TERR5RRTR555465",
          fv: "FV/12344/2019"
        },
        { key: 3, model: "33333", sn: "TERR5RRTR555465", fv: "FV/12344/2019" },
        {
          key: 4,
          model: "44444444",
          sn: "TERR5RRTR555465",
          fv: "FV/12344/2019"
        },
        { key: 5, model: "5555555", sn: "TERR5RRTR555465", fv: "FV/12344/2019" }
      ]
    };
  }

  handleChange = (value,name,id) => {
    const {lista} = this.state;
    const newData = [...lista.filter(item => item.key !== id), { ...lista.filter(item => item.key === id)[0], [name]: value}];
    this.setState({lista: newData});
  }

  addData = () => {
    const {lista} = this.state;
    this.setState({ lista: [...lista, { key: lista[lista.length - 1].key + 1, model: this.model.inputRef.current.value, sn: this.sn.inputRef.current.value, fv: this.fv.inputRef.current.value}]});
    this.model.inputRef.current.value = '';
    this.sn.inputRef.current.value = '';
    this.fv.inputRef.current.value = '';
  }

  trash = (id) => {
    this.setState({lista: this.state.lista.filter(item => item.key !== id)});
  }

  render() {
    const { lista } = this.state;
    return (
      <>
        <div className="segm_space">
          <Message attached header="Table list" />
          <Form className="attached fluid segment">
            <Table
              basic="very"
              celled
              compact
              className="list_hardwares"
              unstackable
            >
              <Table.Header>
                <Table.Row>
                  <Table.HeaderCell>Data</Table.HeaderCell>
                  <Table.HeaderCell>Number</Table.HeaderCell>
                  <Table.HeaderCell>Type</Table.HeaderCell>
                  <Table.HeaderCell style={{ width: "1%" }} />
                </Table.Row>
              </Table.Header>

              <Table.Body>
                {lista.sort((prev,next) => {
                  if (prev.key > next.key) return 1;
                  return -1;
                }).map(item => <Table.Row key={item.key}>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='model'
                      onChange={(e,data) => this.handleChange(data.value,data.name, item.key)}
                      placeholder="00000000000"
                      defaultValue={item.model}
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='sn'
                      onChange={(e, data) => this.handleChange(data.value, data.name, item.key)}
                      placeholder="XXXXXXXXXXXXXXX"
                      defaultValue={item.sn}
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='fv'
                      onChange={(e, data) => this.handleChange(data.value, data.name, item.key)}
                      defaultValue={item.fv}
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Button
                      onClick={() => this.trash(item.key)}
                      compact
                      size="tiny"
                      icon="trash"
                    />
                  </Table.Cell>
                </Table.Row>)}
                <Table.Row>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='model'
                      ref={n => this.model = n}
                      placeholder="00000000000"
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='sn'
                      ref={n => this.sn = n}
                      placeholder="XXXXXXXXXXXXXXX"
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Input
                      fluid
                      transparent
                      name='fv'
                      ref={n => this.fv = n}
                    />
                  </Table.Cell>
                  <Table.Cell>
                    <Button
                      compact
                      size="tiny"
                      icon="add"
                      onClick={this.addData}
                    />
                  </Table.Cell>
                </Table.Row>
              </Table.Body>
            </Table>
          </Form>
        </div>
      </>
    );
  }
}

这是我的结果:

import scrapy


class MyBooks(scrapy.Spider):
    name = 'bookstore'
    allowed_domains = ['books.toscrape.com']
    start_urls = ['http://books.toscrape.com']

    def parse(self, response):
        for book in response.xpath('//article[@class="product_pod"]'):
            yield {

                'title': book.xpath('//h3/a/text()').get(),
                'price': book.xpath('//p[@class="price_color"]/text()').get(),

            }

那是为什么?我哪里错了?

1 个答案:

答案 0 :(得分:0)

我对xpath选择器并不十分熟悉,但由于某种原因,它看起来像book.xpath('//h3/a/text()')book.xpath('//p[@class="price_color"]/text()')返回了其中包含每本书数据的选择器列表。要确认这一点,您可以在这些选择器上调用.getall()而不是.get(),您会看到它返回了每本书结果的列表。我虽然可以使用CSS选择器:

    def parse(self, response):
        for book in response.xpath('//article[@class="product_pod"]'):
            yield {
                'title': book.css('h3').css('a::text').get(),
                'price': book.css('.price_color::text').get()
            }

您可以了解有关选择器Adal Angular 4 - Refresh Token not working as expected的更多信息。