我试图在Ruby 2.1.0中使用Nokogiri解析表。这是我的桌子。如你所见,桌子里面有桌子。这是父表的一个元素。我想一起解析每2个标签。我该怎么办?
总之,我想先将2个tr组合在一起,然后将表格放在2. tr。
中。RUBY CODE:
require 'nokogiri'
require 'sanitize'
doc=Nokogiri(File.open('bannerweb.html'))
items = doc.xpath('(//table)[3]/tr')
items.collect do |row|
if row.at_xpath('th/a/text()') != nil
course = row.at_xpath('th/a/text()').text
course.split(' - ').each do |item|
p item
end
end
if row.at_xpath('td') != nil
info = row.at_xpath('td')
table = info.at_xpath('tr')
p table.inspect
end
end
HTML CODE:
<TR>
<TH CLASS="ddlabel" scope="row" ><A HREF="/prod/bwckschd.p_disp_detail_sched?term_in=201302&crn_in=20500">Introduction to Financial Accounting and Reporting - 20500 - ACC 201 - A</A><BR><BR></TH>
</TR>
<TR>
<TD CLASS="dddefault">
<SPAN class="fieldlabeltext">Associated Term: </SPAN>Spring 2013-2014
<BR>
<SPAN class="fieldlabeltext">Registration Dates: </SPAN> No dates available
<BR>
<SPAN class="fieldlabeltext">Levels: </SPAN>Undeclared, Doctorate, Masters, Exchange - Erasmus Mundus DR, Exchange - Erasmus Mundus MA, Exchange - Erasmus Mundus UG, Special, Scientific Preparatory, Undergraduate, Exchange - Socrates Erasmus DR, Exchange - Socrates Erasmus MA, Exchange - Socrates Erasmus UG
<BR>
<SPAN class="fieldlabeltext">Faculty: </SPAN>
Course Offered by FMAN
<BR>
<SPAN class="fieldlabeltext">Attributes: </SPAN>Lang. of Instruction: English, 6 ECTS, Course Offered by SOM
<BR>
<SPAN class="fieldlabeltext">Instructors: </SPAN>Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>)
<BR>
<BR>
Sabancı University Campus Campus
<BR>
Lecture Schedule Type
<BR>
TR Instructional Method
<BR>
3.000 Credits
<BR>
<A HREF="/prod/bwckctlg.p_display_courses?term_in=201302&one_subj=ACC&sel_crse_strt=201&sel_crse_end=201&sel_subj=&sel_levl=&sel_schd=&sel_coll=&sel_divs=&sel_dept=&sel_attr=">View Catalog Entry</A>
<BR>
<BR>
<TABLE CLASS="datadisplaytable" SUMMARY="This table lists the scheduled meeting times and assigned instructors for this class.."><CAPTION class="captiontext">Scheduled Meeting Times</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col" >Type</TH>
<TH CLASS="ddheader" scope="col" >Time</TH>
<TH CLASS="ddheader" scope="col" >Days</TH>
<TH CLASS="ddheader" scope="col" >Where</TH>
<TH CLASS="ddheader" scope="col" >Date Range</TH>
<TH CLASS="ddheader" scope="col" >Schedule Type</TH>
<TH CLASS="ddheader" scope="col" >Instructors</TH>
</TR>
<TR>
<TD CLASS="dddefault">Class</TD>
<TD CLASS="dddefault">10:40 am - 12:30 pm</TD>
<TD CLASS="dddefault">T</TD>
<TD CLASS="dddefault">School of Management G060</TD>
<TD CLASS="dddefault">Feb 10, 2014 - May 23, 2014</TD>
<TD CLASS="dddefault">1st del</TD>
<TD CLASS="dddefault">Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>)<A HREF="mailto:digdem@sabanciuniv.edu" target="Diğdem Dikkaya Koloğlu" ><IMG SRC="https://suisimg.sabanciuniv.edu/wtlgifs/web_email.gif" ALIGN="middle" ALT="E-mail" CLASS="headerImg" TITLE="E-mail" NAME="web_email" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=28 WIDTH=28></A></TD>
</TR>
<TR>
<TD CLASS="dddefault">Class</TD>
<TD CLASS="dddefault">2:40 pm - 3:30 pm</TD>
<TD CLASS="dddefault">R</TD>
<TD CLASS="dddefault">School of Management G060</TD>
<TD CLASS="dddefault">Feb 10, 2014 - May 23, 2014</TD>
<TD CLASS="dddefault">2nd del</TD>
<TD CLASS="dddefault">Diğdem Dikkaya Koloğlu (<ABBR title= "Primary">P</ABBR>)<A HREF="mailto:digdem@sabanciuniv.edu" target="Diğdem Dikkaya Koloğlu" ><IMG SRC="https://suisimg.sabanciuniv.edu/wtlgifs/web_email.gif" ALIGN="middle" ALT="E-mail" CLASS="headerImg" TITLE="E-mail" NAME="web_email" HSPACE=0 VSPACE=0 BORDER=0 HEIGHT=28 WIDTH=28></A></TD>
</TR>
</TABLE>
<BR>
<BR>
</TD>
</TR>