css_parser是否正确解析内联CSS?

时间:2011-12-27 21:53:40

标签: css ruby parsing

我有一个带内联CSS的HTML文件:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Page 1</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<DIV style="position:relative;width:612;height:792;">
<STYLE type="text/css">
.ft0{font-size:108px;font-family:Helvetica;color:#000000;}
.ft1{font-size:16px;font-family:Times;color:#000000; }
</STYLE>
</HEAD>
<BODY bgcolor="#A0A0A0" vlink="blue" link="blue">
<DIV style="position:absolute;top:457;left:225"><nobr><span class="ft0">Sample</span>   </nobr></DIV>
<DIV style="position:absolute;top:62;left:241"><nobr><span class="ft1"><b>HTML</b></span></nobr></DIV>
</BODY>
</HTML>

我正在尝试使用Ruby的css_parser库解析内联CSS。请注意,内联CSS有2个类.ft0.ft1

我的代码是:

require 'css_parser'
parser = CssParser::Parser.new
parser.load_file!('filename.html')
puts parser.to_s

哪个输出:

<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<HTML>\n<HEAD>   \n<TITLE>Page 1</TITLE>\n<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<DIV style=\"position:relative;width:612;height:792;\">\n<STYLE type=\"text/css\">\n.ft0 {\nfont-size: 108px; font-family: Helvetica; color: #000000;\n}\n.ft1 {\nfont-size: 16px; font-family: Times; color: #000000;\n}\n" 

当我这样做时:

parser.find_by_selector(".ft0") 

它返回一个空数组。

好像css_parser看到了整个字符串

<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">\n<HTML>\n<HEAD>\n<TITLE>Page 1</TITLE>\n<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<DIV style=\"position:relative;width:612;height:792;\">\n<STYLE type=\"text/css\">\n.ft0

作为选择器,而不仅仅是类.ft0

有没有办法解决这个问题,以便它只找到班级.ft0

1 个答案:

答案 0 :(得分:2)

CssParser在HTML中找不到目标,它只需要样式表定义。您需要从HTML解析CSS然后将其传递给CssParser。

这可能会让你开始:

require 'nokogiri'
require 'css_parser'

html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Page 1</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<DIV style="position:relative;width:612;height:792;">
<STYLE type="text/css">
.ft0{font-size:108px;font-family:Helvetica;color:#000000;}
.ft1{font-size:16px;font-family:Times;color:#000000; }
</STYLE>
</HEAD>
<BODY bgcolor="#A0A0A0" vlink="blue" link="blue">
<DIV style="position:absolute;top:457;left:225"><nobr><span class="ft0">Sample</span>   </nobr></DIV>
<DIV style="position:absolute;top:62;left:241"><nobr><span class="ft1"><b>HTML</b></span></nobr></DIV>
</BODY>
</HTML>
'

doc = Nokogiri::HTML(html)

stylesheet = doc.at('style').content
parser = CssParser::Parser.new
parser.add_block!(stylesheet)
puts parser.find_by_selector(".ft0") 

哪个输出:

font-size: 108px; font-family: Helvetica; color: #000000;