Question

我正试图从网站上抓取多个页面。我想刮一页，然后点击下一步，获取该页面，然后重复直到我结束。我到目前为止写了这个：

page = agent.submit(form, form.buttons.first)
#submitting a form
while lien = page.link_with(:text=>'Next')
  # while I have a next link on page, keep scraping
  html_body = Nokogiri::HTML(body)
  links = html_body.css('.list').xpath("//table/tbody/tr/td[2]/a[1]")
  links.each do |link|
    purelink = link['href']
    puts purelink[/codeClub=([^&]*)/].gsub('codeClub=', '')
    lien.click
  end
end

不幸的是，通过这个脚本，我继续在无限循环中抓取同一页面......我怎样才能实现我想做的事情？

Answer 1

我会尝试这一点，将lien.click替换为page = lien.click。

Answer 2

看起来应该更像这样：

error[E0597]: `y` does not live long enough
 --> src/main.rs:5:45
  |
5 |     (0..m).flat_map(|y| (0..n).map(|x| f(x, y))).collect()
  |                                    ---      ^ -          - borrowed value needs to live until here
  |                                    |        | |
  |                                    |        | borrowed value only lives until here
  |                                    |        borrowed value does not live long enough
  |                                    capture occurs here

此外，你不需要用nokogiri解析页面主体，机械化已经为你做了。

使用Nokogiri和Mechanize将连续页面刮到最后一页

2 个答案: