如何从XML文件中获取标题和描述数据

时间:2015-09-01 19:24:57

标签: ruby xml nokogiri

我正在尝试以这种格式将数据写入文本文件:

Channel|Date|Start time|Duration|Title|Desc

这是XML示例:

<!-- language: xml -->
<tv>
  <channel id="YLE TV1">
    <display-name lang="fi">YLE TV1</display-name>
  <programme start="20150828110000 +0300" stop="20150828110500 +0300" >channel="YLE TV1">
    <title lang="fi">Yle Uutiset</title>
  </programme>
  <programme start="20150828110500 +0300" stop="20150828111200 +0300" >channel="YLE TV1">
    <title lang="fi">Yle Uutiset Uusimaa</title>
    <desc lang="fi">Uutisia Uudeltamaalta.(n)</desc>
  </programme>
</tv>

这是我的Ruby代码:

require 'rubygems'
require 'nokogiri'

open('myfile.out', 'a') do |f|
  doc = Nokogiri::XML(File.open("guidetv1.xml"))
  doc.css("programme").each do |response_node|

    strChannel = response_node["channel"]
    if(strChannel.eql? "YLE TV1")
      strChannel = "1"
    elsif(strChannel.eql? "YLE TV2")
      strChannel = "2"
    end
    strStart = response_node["start"]
    strStop = response_node["stop"]
    strTitle = response_node["title"]
    strDesc = response_node["desc"]
    f.puts strChannel + "|" + strStart + "|" + strStop + "|" + strTitle + "|" + strDesc
  end
end

如何阅读titledesc数据? 如何检查Desc是否存在?

1 个答案:

答案 0 :(得分:0)

这是未经测试的代码。修改了XML以使其在语法上正确。阅读CSVDate/DateTimeTime文档,以完成标记的代码。

require 'csv'
require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<tv>
  <channel id="YLE TV1">
    <display-name lang="fi">YLE TV1</display-name>
  <programme start="20150828110000 +0300" stop="20150828110500 +0300" channel="YLE TV1">
    <title lang="fi">Yle Uutiset</title>
  </programme>
  <programme start="20150828110500 +0300" stop="20150828111200 +0300" channel="YLE TV1">
    <title lang="fi">Yle Uutiset Uusimaa</title>
    <desc lang="fi">Uutisia Uudeltamaalta.(n)</desc>
  </programme>
</tv>
EOT

CSV.open(
  'file.csv', 'wb', 
  quote_char: '|', 
  headers: ['Channel', 'Date', 'Start time', 'Duration', 'Title', 'Desc']
) do |csv|
  doc.search('programme').each do |programme|
    channel = programme['channel']
    date = programme['start']       # => further processing needed
    start_time = programme['start'] # => further processing needed
    stop_time = programme['stop']   # => further processing needed
    duration = 0                    # => further processing needed
    title = programme.at('title').text
    desc = programme.at('desc').text || ''

    csv << [channel, date, start_time, duration, title, desc]
  end
end