Question

在我的应用程序中，我使用Nokogiri从 exmple.com ＆amp;获取第一个链接基于此链接，它找到页面标题＆amp; image ＆amp;将其添加到数据库中。

这就是我目前的工作方式（这将每小时运行一次）：

def nokogiri_link
  # This finds the first link
  website_url = ["http://www.example.com/news/", "http://www.sec-example.com/viral/", "http://www.fun-example.com/fun/"].sample
  doc_website = Nokogiri::HTML(open(website_url), nil, 'UTF-8')
  first_anchor = doc_website.at_css('.article_link').first

  # This finds title and image from first links page
  doc = Nokogiri::HTML(open(first_anchor[1]), nil, 'UTF-8')

  title = ""
  image_url = ""

  doc.xpath("//head//meta").each do |meta|
    if meta['property'] == 'og:title'
      title = meta['content']
    elsif meta['property'] == 'og:image'
      image_url = meta['content']
    end
  end

  @image_link = image_url
  @link = first_anchor[1]
  @title = title
end

现在这种方法很好用，但是我从中获取链接的网站通常不会每小时更新链接，所以我想要实现的是，当一个网站被选中（随机）时website_url 数组并尝试查找第一个链接，如果title表中存在posts，则需要从{{1}中选择另一个链接} array 并尝试选择其第一个链接。

Answer 1

您可以更改它，以便方法循环锚点而不是仅使用第一个锚点。如果在数据库中找到它，则将next转到下一次迭代：

def nokogiri_link
  website_url = "http://www.example.com/news/"
  doc_website = Nokogiri::HTML(open(website_url), nil, 'UTF-8')
  doc_website.at_css('.article_link').each do |anchor|
    doc = Nokogiri::HTML(open(anchor[1]), nil, 'UTF-8')
    title = ""
    image_url = ""

    doc.xpath("//head//meta").each do |meta|
      if meta['property'] == 'og:title'
        title = meta['content']
      elsif meta['property'] == 'og:image'
        image_url = meta['content']
      end
    end
    @image_link = image_url
    @link = first_anchor[1]
    @title = title
    unless Post.exists?(title: title)
      Post.create!(title: title)
      break
    end
  end
end

Nokogiri - 检查链接是否已添加到数据库中，如果是，请转到下一个链接

1 个答案: