在我的应用程序中,我使用Nokogiri
从 exmple.com &获取第一个链接基于此链接,它找到页面标题& image &将其添加到数据库中。
这就是我目前的工作方式(这将每小时运行一次):
def nokogiri_link
# This finds the first link
website_url = ["http://www.example.com/news/", "http://www.sec-example.com/viral/", "http://www.fun-example.com/fun/"].sample
doc_website = Nokogiri::HTML(open(website_url), nil, 'UTF-8')
first_anchor = doc_website.at_css('.article_link').first
# This finds title and image from first links page
doc = Nokogiri::HTML(open(first_anchor[1]), nil, 'UTF-8')
title = ""
image_url = ""
doc.xpath("//head//meta").each do |meta|
if meta['property'] == 'og:title'
title = meta['content']
elsif meta['property'] == 'og:image'
image_url = meta['content']
end
end
@image_link = image_url
@link = first_anchor[1]
@title = title
end
现在这种方法很好用,但是我从中获取链接的网站通常不会每小时更新链接,所以我想要实现的是,当一个网站被选中(随机)时website_url
数组并尝试查找第一个链接,如果title
表中存在posts
,则需要从{{1}中选择另一个链接} array 并尝试选择其第一个链接。
答案 0 :(得分:0)
您可以更改它,以便方法循环锚点而不是仅使用第一个锚点。如果在数据库中找到它,则将next
转到下一次迭代:
def nokogiri_link
website_url = "http://www.example.com/news/"
doc_website = Nokogiri::HTML(open(website_url), nil, 'UTF-8')
doc_website.at_css('.article_link').each do |anchor|
doc = Nokogiri::HTML(open(anchor[1]), nil, 'UTF-8')
title = ""
image_url = ""
doc.xpath("//head//meta").each do |meta|
if meta['property'] == 'og:title'
title = meta['content']
elsif meta['property'] == 'og:image'
image_url = meta['content']
end
end
@image_link = image_url
@link = first_anchor[1]
@title = title
unless Post.exists?(title: title)
Post.create!(title: title)
break
end
end
end