我试图以Hpricot / Nokogiri类型的方式解析,而不是评估,修改ERB文件。我试图解析的文件包含混合了使用ERB生成的动态内容的HTML片段(标准的rails视图文件)我正在寻找一个不仅会解析周围内容的库,就像Hpricot或Nokogiri那样,但也会对待它ERB符号,<%,<%=等,就好像它们是html / xml标签一样。
理想情况下,我会回到类似DOM的结构,其中<%,<%= etc符号将作为自己的节点类型包含在内。
我知道可以使用正则表达式一起破解某些东西,但我正在寻找一些更可靠的东西,因为我正在开发一个工具,我需要在一个非常大的视图代码库上运行,其中包含html内容和erb内容很重要。
例如,内容如:
blah blah blah <div>My Great Text <%= my_dynamic_expression %></div>
将返回一个树结构,如:
root - text_node (blah blah blah) - element (div) - text_node (My Great Text ) - erb_node (<%=)
答案 0 :(得分:3)
我最终通过使用RLex,http://raa.ruby-lang.org/project/ruby-lex/,带有以下语法的lex的ruby版本来解决这个问题:
%{ #define NUM 257 #define OPTOK 258 #define IDENT 259 #define OPETOK 260 #define CLSTOK 261 #define CLTOK 262 #define FLOAT 263 #define FIXNUM 264 #define WORD 265 #define STRING_DOUBLE_QUOTE 266 #define STRING_SINGLE_QUOTE 267 #define TAG_START 268 #define TAG_END 269 #define TAG_SELF_CONTAINED 270 #define ERB_BLOCK_START 271 #define ERB_BLOCK_END 272 #define ERB_STRING_START 273 #define ERB_STRING_END 274 #define TAG_NO_TEXT_START 275 #define TAG_NO_TEXT_END 276 #define WHITE_SPACE 277 %} digit [0-9] blank [ ] letter [A-Za-z] name1 [A-Za-z_] name2 [A-Za-z_0-9] valid_tag_character [A-Za-z0-9"'=@_():/ ] ignore_tags style|script %% {blank}+"\n" { return [ WHITE_SPACE, yytext ] } "\n"{blank}+ { return [ WHITE_SPACE, yytext ] } {blank}+"\n"{blank}+ { return [ WHITE_SPACE, yytext ] } "\r" { return [ WHITE_SPACE, yytext ] } "\n" { return[ yytext[0], yytext[0..0] ] }; "\t" { return[ yytext[0], yytext[0..0] ] }; ^{blank}+ { return [ WHITE_SPACE, yytext ] } {blank}+$ { return [ WHITE_SPACE, yytext ] }; "" { return [ TAG_NO_TEXT_START, yytext ] } "" { return [ TAG_NO_TEXT_END, yytext ] } "" { return [ TAG_SELF_CONTAINED, yytext ] } "" { return [ TAG_SELF_CONTAINED, yytext ] } "" { return [ TAG_START, yytext ] } "" { return [ TAG_END, yytext ] } "" { return [ ERB_BLOCK_END, yytext ] } "" { return [ ERB_STRING_END, yytext ] } {letter}+ { return [ WORD, yytext ] } \".*\" { return [ STRING_DOUBLE_QUOTE, yytext ] } '.*' { return [ STRING_SINGLE_QUOTE, yytext ] } . { return [ yytext[0], yytext[0..0] ] } %%
这不是一个完整的语法,但出于我的目的,查找和重新发送文本,它的工作原理。我将这个语法与这一小段代码结合起来:
text_handler = MakeYourOwnCallbackHandler.new l = Erblex.new l.yyin = File.open(file_name, "r") loop do a,v = l.yylex break if a == 0 if( a < WORD ) text_handler.character( v.to_s, a ) else case a when WORD text_handler.text( v.to_s ) when TAG_START text_handler.start_tag( v.to_s ) when TAG_END text_handler.end_tag( v.to_s ) when WHITESPACE text_handler.white_space( v.to_s ) when ERB_BLOCK_START text_handler.erb_block_start( v.to_s ) when ERB_BLOCK_END text_handler.erb_block_end( v.to_s ) when ERB_STRING_START text_handler.erb_string_start( v.to_s ) when ERB_STRING_END self.text_handler.erb_string_end( v.to_s ) when TAG_NO_TEXT_START text_handler.ignorable_tag_start( v.to_s ) when TAG_NO_TEXT_END text_handler.ignorable_tag_end( v.to_s ) when STRING_DOUBLE_QUOTE text_handler.string_double_quote( v.to_s ) when STRING_SINGLE_QUOTE text_handler.string_single_quote( v.to_s ) when TAG_SELF_CONTAINED text_handler.tag_self_contained( v.to_s ) end end end
答案 1 :(得分:2)
我最近遇到了类似的问题。我采用的方法是编写一个小脚本(erblint.rb)执行字符串替换以将ERB标记(<% %>
和<%= %>
)转换为XML标记,然后使用Nokogiri进行解析。
请参阅以下代码以了解我的意思:
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
# This is a simple program that reads in a Ruby ERB file, and parses
# it as an XHTML file. Specifically, it makes a decent attempt at
# converting the ERB tags (<% %> and <%= %>) to XML tags (<erb-disp/>
# and <erb-eval/> respectively.
#
# Once the document has been parsed, it will be validated and any
# error messages will be displayed.
#
# More complex option and error handling is left as an exercise to the user.
abort 'Usage: erb.rb <filename>' if ARGV.empty?
filename = ARGV[0]
begin
doc = ""
File.open(filename) do |file|
puts "\n*** Parsing #{filename} ***\n\n"
file.read(nil, s = "")
# Substitute the standard ERB tags to convert them to XML tags
# <%= ... %> for <erb-disp> ... </erb-disp>
# <% ... %> for <erb-eval> ... </erb-eval>
#
# Note that this won't work for more complex expressions such as:
# <a href=<% @some_object.generate_url -%> >link text</a>
# Of course, this is not great style, anyway...
s.gsub!(/<%=(.+?)%>/m, '<erb-disp>\1</erb-disp>')
s.gsub!(/<%(.+?)%>/m, '<erb-eval>\1</erb-eval>')
doc = Nokogiri::XML(s) do |config|
# put more config options here if required
# config.strict
end
end
puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8')
puts "Huzzah, no errors!" if doc.errors.empty?
# Otherwise, print each error message
doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" }
rescue
puts "Oops! Cannot open #{filename}"
end
我已将此作为Github上的要点发布:https://gist.github.com/787145