用Nokogiri提取数据的麻烦

时间:2016-05-10 01:45:57

标签: ruby-on-rails ruby xml xpath nokogiri

我正在练习从XML站点提取数据,而我正在使用Nokogiri进行读取和解析。我需要分析数据但是现在,我只是试图获得输出而没有成功。

我有以下代码:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://www.ibiblio.org/xml/examples/shakespeare/macbeth.xml"))

doc.xpath('//PERSONA').each do |char_element|
  puts char_element.text
end

我只是想读取XML网站上的字符,但是当我在终端中运行它时,我没有得到任何结果。我也试过写一个简单的xpath调用,如下面的那个:

doc.xpath("//PERSONA")

doc.xpath("PLAY TITLE")

我得到一个错误,或者只是表现得没有输入。 我已经放了一个简单的函数来测试它,所以我知道它在读它。谁能告诉我我做错了什么?

1 个答案:

答案 0 :(得分:1)

您正在尝试将XML文件读取为HTML文件。 请尝试这个例子:

doc = Nokogiri::XML(open("http://www.ibiblio.org/xml/examples/shakespeare/macbeth.xml"))

doc.xpath('//PERSONA').each{|ce| p ce.text }
"DUNCAN, king of Scotland."
"MALCOLM"
"DONALBAIN"
"MACBETH"
"BANQUO"
"MACDUFF"
"LENNOX"
"ROSS"
"MENTEITH"
"ANGUS"
"CAITHNESS"
"FLEANCE, son to Banquo."
"SIWARD, Earl of Northumberland, general of the English forces."
"YOUNG SIWARD, his son."
"SEYTON, an officer attending on Macbeth."
"Boy, son to Macduff. "
"An English Doctor. "
"A Scotch Doctor. "
"A Soldier."
"A Porter."
"An Old Man."
"LADY MACBETH"
"LADY MACDUFF"
"Gentlewoman attending on Lady Macbeth. "
"HECATE"
"Three Witches."
"Apparitions."
"Lords, Gentlemen, Officers, Soldiers, Murderers, Attendants, and Messengers. "

请确保您使用Nokogiri::XML代替Nokogiri::HTML