如何从现有文本文件中提取哈希数组?

时间:2016-06-20 09:51:30

标签: ruby parsing erb

我有一种凌乱的情况。我正在迁移在现已不存在的静态站点生成器(webby)中创建的大量数据,其中有大量数据存储在准ERB模板文件中。

我正在尝试编写一些Ruby来解析这些文件,抓住我需要的东西,然后将它们写入我的新应用程序数据文件中。

我遇到的问题是我在现有文件中没有很多规范化,并且对某些模式的匹配很棘手。

例如,每个“事件”(这是针对技术会议网站)都有一个_sponsors.txt文件,其中包含该特定事件的赞助商的信息,该信息由一系列哈希构成。这些数组并不总是完全相同,但它们通常是相似的。

这是其中一个文件的片段:

<% @psponsors = [
{ :image => 'ca_technologies.png', :name => 'CA Technologies', :link =>     'http://www.ca.com/fr', :width => '100px', :height => '100px' },
{ :image => 'puppetlabs.png', :name => 'PuppetLabs', :link => 'https://puppetlabs.com', :width => '100px', :height => '100px' },
{ :image => 'microsoft_azure.png', :name => 'Microsoft Azure', :link => 'http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989', :width => '100px', :height => '100px' },
] %>
<% if @psponsors.empty? %>
<i>&nbsp;&nbsp;&nbsp;<a href='<%= File.join('/',@eventhome,'/sponsor') -%>'>Be the first to sponsor!</a></i>
<% end %>
<% @psponsors.each do |sponsor| %>
<a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/',@eventhome,"logos/#{sponsor[:image]}") %>" /></a>
<% end %>
<h1>Gold sponsors</h1>
<% @gsponsors = [
{ :image => 'normation.png', :name => 'Normation', :link => 'http://www.normation.com', :width => '100px', :height => '100px' },
{ :image => 'gandi.png', :name => 'Gandi.net', :link => 'https://www.gandi.net', :width => '100px', :height => '100px' },
{ :image => 'xebialabs.png', :name => 'XebiaLabs', :link => 'http://www.xebialabs.com', :width => '100px', :height => '100px' },
{ :image => 'redhat.png', :name => 'Red Hat', :link => 'https://www.redhat.com', :width => '100px', :height => '100px' },
{ :image => 'delphix.png', :name => 'Delphix', :link => 'http://delphix.com', :width => '100px', :height => '100px' },
{ :image => 'chef.png', :name => 'Chef', :link => 'http://chef.io', :width => '100px', :height => '100px' },
] %>

当我尝试读入整个文件并匹配我正在寻找的外部参数时,我最终得到了一堆我不想要的匹配。我目前的解决方法是简单地读取每一行,并设置状态,如果该行匹配正确的开始,然后继续阅读,然后在我结束时中断。这似乎完全不令人愉快,我确信我错过了一种更优雅的方式来做到这一点。

2 个答案:

答案 0 :(得分:2)

不是试图解析这些文件,为什么不尝试执行它们呢?只需添加一些代码将这些(或所有:instance_variables.each { |varname| ...)实例变量作为JSON转储到stdout或类似的东西,然后尝试通过ERB解释器运行它。

答案 1 :(得分:0)

尝试nokogirixpath会怎样?

<强> erb_template.erb

<% @psponsors = [
    {:image => 'ca_technologies.png', :name => 'CA Technologies', :link => 'http://www.ca.com/fr', :width => '100px', :height => '100px'},
    {:image => 'puppetlabs.png', :name => 'PuppetLabs', :link => 'https://puppetlabs.com', :width => '100px', :height => '100px'},
    {:image => 'microsoft_azure.png', :name => 'Microsoft Azure', :link => 'http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989', :width => '100px', :height => '100px'},
] %>
<% if @psponsors.empty? %>
    <i>&nbsp;&nbsp;&nbsp;<a href='http://localhost'>Be the first to sponsor!</a></i>
<% end %>
<% @psponsors.each do |sponsor| %>
    <a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/', @eventhome, "logos/#{sponsor[:image]}") %>"/></a>
<% end %>
<h1>Gold sponsors</h1>
<% @gsponsors = [
    {:image => 'normation.png', :name => 'Normation', :link => 'http://www.normation.com', :width => '100px', :height => '100px'},
    {:image => 'gandi.png', :name => 'Gandi.net', :link => 'https://www.gandi.net', :width => '100px', :height => '100px'},
    {:image => 'xebialabs.png', :name => 'XebiaLabs', :link => 'http://www.xebialabs.com', :width => '100px', :height => '100px'},
    {:image => 'redhat.png', :name => 'Red Hat', :link => 'https://www.redhat.com', :width => '100px', :height => '100px'},
    {:image => 'delphix.png', :name => 'Delphix', :link => 'http://delphix.com', :width => '100px', :height => '100px'},
    {:image => 'chef.png', :name => 'Chef', :link => 'http://chef.io', :width => '100px', :height => '100px'},
] %>
<% @gsponsors.each do |sponsor| %>
    <a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/', @eventhome, "logos/#{sponsor[:image]}") %>"/></a>
<% end %>

<强>过程

# encoding: utf-8

require 'nokogiri'
require 'erb'

@eventhome = ""

path = File.join("./erb_template.erb")
doc = Nokogiri::HTML(ERB.new(File.read(path)).result(binding))

links = doc.xpath("//a[./img]")

export = links.each_with_object({}) do |element, h|
  h[element["href"]] = element.first_element_child["title"]
end

<强>输出

# {
#   "http://www.ca.com/fr" => "CA Technologies",
#   "https://puppetlabs.com" => "PuppetLabs",
#   "http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989" => "Microsoft Azure",
#   "http://www.normation.com" => "Normation",
#   "https://www.gandi.net" => "Gandi.net",
#   "http://www.xebialabs.com" => "XebiaLabs",
#   "https://www.redhat.com" => "Red Hat",
#   "http://delphix.com" => "Delphix",
#   "http://chef.io" => "Chef"
# }