我正试图从英国政府的UK Oil Portal中删除项目清单,但我的代码没有返回数据。相反,我想制作一系列项目标题。
class Entry
def initialize(title)
@title = title
end
attr_reader :title
end
def index
@projects=Project.all
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("https://itportal.decc.gov.uk/pathfinder/currentprojectsindex.html"))
entries = doc.css('.operator-container')
@entries = []
entries.each do |row|
title = row.css('.setoutForm').text
@entries << Entry.new(title)
end
end
答案 0 :(得分:3)
您发布的链接不包含任何数据。您看到的页面是框架集,每个框架由其自己的URL创建。您想要解析左框架,因此您应该编辑代码以打开左框架的URL:
doc = Nokogiri::HTML(open('https://itportal.decc.gov.uk/eng/fox/path/PATH_REPORTS/current-projects-index'))
单个项目位于不同的页面上,您需要打开每个项目。例如,第一个是:
project_file = open(entries.first.css('a').attribute('href').value)
project_doc = Nokogiri::HTML(project_file)
“setoutForm”类会删除大量文本。例如:
> project_doc.css('.setoutForm').text
=> "\n \n Field Type\n Location\n Water De
pth (m)\n First Production\n Contact\n \n \n
Oil\n 2/15\n 155m\n Q3/2018\n
\n John Gill\n Business Development Manager\n
jgill@alphapetroleum.com\n 01483 307204\n \n \n
\n \n Project Summary\n \n \n
\n The Cheviot discovery is located in blocks 2/10a, 2/15a and 3/11b. \n
\n Reserves are approximately 46mmbbls oil.\n \
n A Field Development Plan has been submitted and technically approved. The c
oncept is for a leased FPSA with 18+ subsea wells. Oil export will be via tanker offloading.
\n \n \n \n "
但标题不在该文本中。如果你想要标题,请抓住页面的这一部分:
<div class="field-header" foxid="eu1KcH_d4qniAjiN">Cheviot</div>
你可以用这个CSS选择器做什么:
> project_doc.css('.operator-container .field-header').text
=> "Cheviot"
逐步编写此代码。除非你单步执行,否则很难找到代码出错的地方。例如,我使用Nokogiri的command line tool打开一个交互式Ruby shell,带有
nokogiri https://itportal.decc.gov.uk/eng/fox/path/PATH_REPORTS/current-projects-index