我有一种凌乱的情况。我正在迁移在现已不存在的静态站点生成器(webby)中创建的大量数据,其中有大量数据存储在准ERB模板文件中。
我正在尝试编写一些Ruby来解析这些文件,抓住我需要的东西,然后将它们写入我的新应用程序数据文件中。
我遇到的问题是我在现有文件中没有很多规范化,并且对某些模式的匹配很棘手。
例如,每个“事件”(这是针对技术会议网站)都有一个_sponsors.txt
文件,其中包含该特定事件的赞助商的信息,该信息由一系列哈希构成。这些数组并不总是完全相同,但它们通常是相似的。
这是其中一个文件的片段:
<% @psponsors = [
{ :image => 'ca_technologies.png', :name => 'CA Technologies', :link => 'http://www.ca.com/fr', :width => '100px', :height => '100px' },
{ :image => 'puppetlabs.png', :name => 'PuppetLabs', :link => 'https://puppetlabs.com', :width => '100px', :height => '100px' },
{ :image => 'microsoft_azure.png', :name => 'Microsoft Azure', :link => 'http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989', :width => '100px', :height => '100px' },
] %>
<% if @psponsors.empty? %>
<i> <a href='<%= File.join('/',@eventhome,'/sponsor') -%>'>Be the first to sponsor!</a></i>
<% end %>
<% @psponsors.each do |sponsor| %>
<a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/',@eventhome,"logos/#{sponsor[:image]}") %>" /></a>
<% end %>
<h1>Gold sponsors</h1>
<% @gsponsors = [
{ :image => 'normation.png', :name => 'Normation', :link => 'http://www.normation.com', :width => '100px', :height => '100px' },
{ :image => 'gandi.png', :name => 'Gandi.net', :link => 'https://www.gandi.net', :width => '100px', :height => '100px' },
{ :image => 'xebialabs.png', :name => 'XebiaLabs', :link => 'http://www.xebialabs.com', :width => '100px', :height => '100px' },
{ :image => 'redhat.png', :name => 'Red Hat', :link => 'https://www.redhat.com', :width => '100px', :height => '100px' },
{ :image => 'delphix.png', :name => 'Delphix', :link => 'http://delphix.com', :width => '100px', :height => '100px' },
{ :image => 'chef.png', :name => 'Chef', :link => 'http://chef.io', :width => '100px', :height => '100px' },
] %>
当我尝试读入整个文件并匹配我正在寻找的外部参数时,我最终得到了一堆我不想要的匹配。我目前的解决方法是简单地读取每一行,并设置状态,如果该行匹配正确的开始,然后继续阅读,然后在我结束时中断。这似乎完全不令人愉快,我确信我错过了一种更优雅的方式来做到这一点。
答案 0 :(得分:2)
不是试图解析这些文件,为什么不尝试执行它们呢?只需添加一些代码将这些(或所有:instance_variables.each { |varname| ...
)实例变量作为JSON转储到stdout或类似的东西,然后尝试通过ERB解释器运行它。
答案 1 :(得分:0)
尝试nokogiri
和xpath
会怎样?
<强> erb_template.erb 强>
<% @psponsors = [
{:image => 'ca_technologies.png', :name => 'CA Technologies', :link => 'http://www.ca.com/fr', :width => '100px', :height => '100px'},
{:image => 'puppetlabs.png', :name => 'PuppetLabs', :link => 'https://puppetlabs.com', :width => '100px', :height => '100px'},
{:image => 'microsoft_azure.png', :name => 'Microsoft Azure', :link => 'http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989', :width => '100px', :height => '100px'},
] %>
<% if @psponsors.empty? %>
<i> <a href='http://localhost'>Be the first to sponsor!</a></i>
<% end %>
<% @psponsors.each do |sponsor| %>
<a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/', @eventhome, "logos/#{sponsor[:image]}") %>"/></a>
<% end %>
<h1>Gold sponsors</h1>
<% @gsponsors = [
{:image => 'normation.png', :name => 'Normation', :link => 'http://www.normation.com', :width => '100px', :height => '100px'},
{:image => 'gandi.png', :name => 'Gandi.net', :link => 'https://www.gandi.net', :width => '100px', :height => '100px'},
{:image => 'xebialabs.png', :name => 'XebiaLabs', :link => 'http://www.xebialabs.com', :width => '100px', :height => '100px'},
{:image => 'redhat.png', :name => 'Red Hat', :link => 'https://www.redhat.com', :width => '100px', :height => '100px'},
{:image => 'delphix.png', :name => 'Delphix', :link => 'http://delphix.com', :width => '100px', :height => '100px'},
{:image => 'chef.png', :name => 'Chef', :link => 'http://chef.io', :width => '100px', :height => '100px'},
] %>
<% @gsponsors.each do |sponsor| %>
<a href="<%= sponsor[:link] %>"><img border="1" alt="<%= sponsor[:name] %>" title="<%= sponsor[:name] %>" width="<%= sponsor[:width] %>" height="<%= sponsor[:height] %>" src="<%= File.join('/', @eventhome, "logos/#{sponsor[:image]}") %>"/></a>
<% end %>
<强>过程强>
# encoding: utf-8
require 'nokogiri'
require 'erb'
@eventhome = ""
path = File.join("./erb_template.erb")
doc = Nokogiri::HTML(ERB.new(File.read(path)).result(binding))
links = doc.xpath("//a[./img]")
export = links.each_with_object({}) do |element, h|
h[element["href"]] = element.first_element_child["title"]
end
<强>输出强>
# {
# "http://www.ca.com/fr" => "CA Technologies",
# "https://puppetlabs.com" => "PuppetLabs",
# "http://www.microsoft.com/click/services/Redirect2.ashx?CR_CC=200618989" => "Microsoft Azure",
# "http://www.normation.com" => "Normation",
# "https://www.gandi.net" => "Gandi.net",
# "http://www.xebialabs.com" => "XebiaLabs",
# "https://www.redhat.com" => "Red Hat",
# "http://delphix.com" => "Delphix",
# "http://chef.io" => "Chef"
# }