Nokogiri解析表组

时间:2014-01-12 19:06:10

标签: ruby xpath nokogiri

我试图在Ruby 2.1.0中使用Nokogiri解析表。这是我的桌子。如你所见,桌子里面有桌子。这是父表的一个元素。我想一起解析每2个标签。我该怎么办?

总之,我想先将2个tr组合在一起,然后将表格放在2. tr。

中。

RUBY CODE:

require 'nokogiri'
require 'sanitize'

doc=Nokogiri(File.open('bannerweb.html'))

items = doc.xpath('(//table)[3]/tr')

items.collect do |row|
if row.at_xpath('th/a/text()') != nil
    course = row.at_xpath('th/a/text()').text 
    course.split(' - ').each do |item|
        p item
    end
end
if row.at_xpath('td') != nil
    info = row.at_xpath('td')
    table = info.at_xpath('tr')
    p table.inspect
end
end

HTML CODE:

<TR>
<TH CLASS="ddlabel" scope="row" ><A HREF="/prod/bwckschd.p_disp_detail_sched?term_in=201302&amp;crn_in=20500">Introduction to Financial Accounting and Reporting - 20500 - ACC 201 - A</A><BR><BR></TH>
</TR>
<TR>
<TD CLASS="dddefault">
<SPAN class="fieldlabeltext">Associated Term: </SPAN>Spring 2013-2014 
<BR>
<SPAN class="fieldlabeltext">Registration Dates: </SPAN> No dates available  
<BR>
<SPAN class="fieldlabeltext">Levels: </SPAN>Undeclared, Doctorate, Masters, Exchange - Erasmus Mundus DR, Exchange - Erasmus Mundus MA, Exchange - Erasmus Mundus UG, Special, Scientific Preparatory, Undergraduate, Exchange - Socrates Erasmus DR, Exchange - Socrates Erasmus MA, Exchange - Socrates Erasmus UG 
<BR>
<SPAN class="fieldlabeltext">Faculty: </SPAN>
Course Offered by FMAN
<BR>
<SPAN class="fieldlabeltext">Attributes: </SPAN>Lang. of Instruction: English, 6 ECTS, Course Offered by SOM 
<BR>
<SPAN class="fieldlabeltext">Instructors: </SPAN>Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>) 
<BR>
<BR>
Sabancı University Campus Campus
<BR>
Lecture Schedule Type
<BR>
TR Instructional Method
<BR>
   3.000 Credits
<BR>
<A HREF="/prod/bwckctlg.p_display_courses?term_in=201302&amp;one_subj=ACC&amp;sel_crse_strt=201&amp;sel_crse_end=201&amp;sel_subj=&amp;sel_levl=&amp;sel_schd=&amp;sel_coll=&amp;sel_divs=&amp;sel_dept=&amp;sel_attr=">View Catalog Entry</A>
<BR>
<BR>
<TABLE  CLASS="datadisplaytable" SUMMARY="This table lists the scheduled meeting times and assigned instructors for this class.."><CAPTION class="captiontext">Scheduled Meeting Times</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col" >Type</TH>
<TH CLASS="ddheader" scope="col" >Time</TH>
<TH CLASS="ddheader" scope="col" >Days</TH>
<TH CLASS="ddheader" scope="col" >Where</TH>
<TH CLASS="ddheader" scope="col" >Date Range</TH>
<TH CLASS="ddheader" scope="col" >Schedule Type</TH>
<TH CLASS="ddheader" scope="col" >Instructors</TH>
</TR>
<TR>
<TD CLASS="dddefault">Class</TD>
<TD CLASS="dddefault">10:40 am - 12:30 pm</TD>
<TD CLASS="dddefault">T</TD>
<TD CLASS="dddefault">School of Management G060</TD>
<TD CLASS="dddefault">Feb 10, 2014 - May 23, 2014</TD>
<TD CLASS="dddefault">1st del</TD>
<TD CLASS="dddefault">Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>)<A HREF="mailto:digdem@sabanciuniv.edu"    target="Diğdem Dikkaya Koloğlu" ><IMG SRC="https://suisimg.sabanciuniv.edu/wtlgifs/web_email.gif" ALIGN="middle" ALT="E-mail" CLASS="headerImg" TITLE="E-mail"  NAME="web_email" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=28 WIDTH=28></A></TD>
</TR>
<TR>
<TD CLASS="dddefault">Class</TD>
<TD CLASS="dddefault">2:40 pm - 3:30 pm</TD>
<TD CLASS="dddefault">R</TD>
<TD CLASS="dddefault">School of Management G060</TD>
<TD CLASS="dddefault">Feb 10, 2014 - May 23, 2014</TD>
<TD CLASS="dddefault">2nd del</TD>
<TD CLASS="dddefault">Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>)<A HREF="mailto:digdem@sabanciuniv.edu"    target="Diğdem Dikkaya Koloğlu" ><IMG SRC="https://suisimg.sabanciuniv.edu/wtlgifs/web_email.gif" ALIGN="middle" ALT="E-mail" CLASS="headerImg" TITLE="E-mail"  NAME="web_email" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=28 WIDTH=28></A></TD>
</TR>
</TABLE>
<BR>
<BR>
</TD>
</TR>

0 个答案:

没有答案