我有以下凌乱的HTML表格,用于显示记录列表。
<table><tbody> <tr id="RECORD_1">
<td valign="top" class="summary_recnum"><input value="1" name="marked_list_candidates" type="checkbox"> 1. <div id="ml_indicator_1">
</div>
<div id="enw_link_1">
</div>
</td><td class="summary_data"><div>
<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&search_mode=GeneralSearch&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=1" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">
<value lang_id="">A Multitier System for the Verification, Visualization and Management of CHIMERA</value>
</a>
</div>
<div>
<span class="label">Author(s): </span>Lingerfelt E. J.; Messer O. E. B.; Osborne J. A.; et al.</div>
<div>
<span class="label">Editor(s): </span>Sato M; Matsuoka S; Sloot PMA; et al.</div>
<div>
<span class="label">Conference:
</span> <span class="data_bold">
<value>International Conference on Computational Science (ICCS) on the Ascent of Computational Excellence</value>
</span> <span class="label">Location: </span><span class="data_bold">Campus Nanyang Technolog Univ, Singapore, SINGAPORE</span> <span class="label">Date: </span><span class="data_bold">2011</span>
<br>
<span class="label">Sponsor(s): </span><span class="data_bold">Elsevier; Univ Tsukuba, Ctr Computat Sci</span>
</div>
<span class="label">Source: </span>PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS) <span class="label">Book Series: </span><span class="data_bold">Procedia Computer Science</span> <span class="label">Volume: </span><span class="data_bold">4</span> <span class="label">Pages: </span><span class="data_bold">2076-2085</span> <span class="label">DOI: </span><span class="data_bold">10.1016/j.procs.2011.04.227</span> <span class="label">Published: </span><span class="data_bold">2011</span>
<div>
<span class="label">Times Cited: </span><span class="data_bold">0</span> (from All Databases) </div>
<br>
<div style="display: inline-block" id="links_1">
<nobr><span id="links_openurl_1"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&mode=fastOpenUrl&SID=2DI1PEg5Ja24IHi95Fc&product=UA&qid=2&doc=1&publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_1"> </span><span id="links_doc_del_1"> </span><span id="links_patent_1"> </span></nobr>
</div>
<span style="display: inline" class="ViewAbstract1_text" id="ViewAbstract1_text">
[
<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('1', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract1_img">View abstract</a>
]
</span><span style="display: none" class="HideAbstract1_text" id="HideAbstract1_text">
[
<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('1', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract1_img">Hide abstract</a>
]
</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=1" id="ViewAbstract_Span1">
<!----></span></td></tr><tr id="RECORD_2">
<td valign="top" class="summary_recnum"><input value="2" name="marked_list_candidates" type="checkbox"> 2. <div id="ml_indicator_2">
</div>
<div id="enw_link_2">
</div>
</td><td class="summary_data"><div>
<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&search_mode=GeneralSearch&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=2" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">
<value lang_id="">Gravitational waves from core collapse supernovae</value>
</a>
</div>
<div>
<span class="label">Author(s): </span>Yakunin Konstantin N.; Marronetti Pedro; <span class="hitHilite">Mezzacappa Anthony</span>; et al.</div>
<div>
<span class="label">Conference:
</span> <span class="data_bold">
<value>14th Gravitational Wave Data Analysis Workshop (GWDAW-14)</value>
</span> <span class="label">Location: </span><span class="data_bold">Univ Rome, Rome, ITALY</span> <span class="label">Date: </span><span class="data_bold">JAN 26-29, 2010</span>
</div>
<span class="label">Source: </span>CLASSICAL AND QUANTUM GRAVITY <span class="label">Volume: </span><span class="data_bold">27</span> <span class="label">Issue: </span><span class="data_bold">19</span> <span class="label">Special Issue: </span><span class="data_bold">SI</span> <span class="label">Article Number: </span><span class="data_bold">194005</span> <span class="label">DOI: </span><span class="data_bold">10.1088/0264-9381/27/19/194005</span> <span class="label">Published: </span><span class="data_bold">OCT 7 2010</span>
<div>
<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&SID=2DI1PEg5Ja24IHi95Fc&search_mode=CitingArticles&parentProduct=UA&parentQid=2&parentDoc=2&REFID=337695000&betterCount=7" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">7</a> (from All Databases) </div>
<br>
<div style="display: inline-block" id="links_2">
<nobr><span id="links_openurl_2"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&mode=fastOpenUrl&SID=2DI1PEg5Ja24IHi95Fc&product=UA&qid=2&doc=2&publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_2"> </span><span id="links_doc_del_2"> </span><span id="links_patent_2"> </span></nobr>
</div>
<span style="display: inline" class="ViewAbstract2_text" id="ViewAbstract2_text">
[
<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('2', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract2_img">View abstract</a>
]
</span><span style="display: none" class="HideAbstract2_text" id="HideAbstract2_text">
[
<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('2', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract2_img">Hide abstract</a>
]
</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=2" id="ViewAbstract_Span2">
<!----></span></td></tr><tr id="RECORD_3">
<td valign="top" class="summary_recnum"><input value="3" name="marked_list_candidates" type="checkbox"> 3. <div id="ml_indicator_3">
</div>
<div id="enw_link_3">
</div>
</td><td class="summary_data"><div>
<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&search_mode=GeneralSearch&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=3" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">
<value lang_id="">Protoneutron star evolution and the neutrino-driven wind in general relativistic neutrino radiation hydrodynamics simulations</value>
</a>
</div>
<div>
<span class="label">Author(s): </span>Fischer T.; Whitehouse S. C.; <span class="hitHilite">Mezzacappa A</span>.; et al.</div>
<span class="label">Source: </span>ASTRONOMY & ASTROPHYSICS <span class="label">Volume: </span><span class="data_bold">517</span> <span class="label">Article Number: </span><span class="data_bold">A80</span> <span class="label">DOI: </span><span class="data_bold">10.1051/0004-6361/200913106</span> <span class="label">Published: </span><span class="data_bold">JUL 2010</span>
<div>
<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&SID=2DI1PEg5Ja24IHi95Fc&search_mode=CitingArticles&parentProduct=UA&parentQid=2&parentDoc=3&REFID=336434672&betterCount=40" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">40</a> (from All Databases) </div>
<br>
<div style="display: inline-block" id="links_3">
<nobr><span id="links_openurl_3"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&mode=fastOpenUrl&SID=2DI1PEg5Ja24IHi95Fc&product=UA&qid=2&doc=3&publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_3"> </span><span id="links_doc_del_3"> </span><span id="links_patent_3"> </span></nobr>
</div>
<span style="display: inline" class="ViewAbstract3_text" id="ViewAbstract3_text">
[
<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('3', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract3_img">View abstract</a>
]
</span><span style="display: none" class="HideAbstract3_text" id="HideAbstract3_text">
[
<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('3', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract3_img">Hide abstract</a>
]
</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=3" id="ViewAbstract_Span3">
<!----></span></td></tr><tr id="RECORD_4">
<td valign="top" class="summary_recnum"><input value="4" name="marked_list_candidates" type="checkbox"> 4. <div id="ml_indicator_4">
</div>
<div id="enw_link_4">
</div>
</td><td class="summary_data"><div>
<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&search_mode=GeneralSearch&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=4" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">
<value lang_id="">GENERATION OF MAGNETIC FIELDS BY THE STATIONARY ACCRETION SHOCK INSTABILITY</value>
</a>
</div>
<div>
<span class="label">Author(s): </span>Endeve Eirik; Cardall Christian Y.; Budiardja Reuben D.; et al.</div>
<span class="label">Source: </span>ASTROPHYSICAL JOURNAL <span class="label">Volume: </span><span class="data_bold">713</span> <span class="label">Issue: </span><span class="data_bold">2</span> <span class="label">Pages: </span><span class="data_bold">1219-1243</span> <span class="label">DOI: </span><span class="data_bold">10.1088/0004-637X/713/2/1219</span> <span class="label">Published: </span><span class="data_bold">APR 20 2010</span>
<div>
<span class="label">Times Cited: </span><a title="View all of the articles that cite this one" href="/CitingArticles.do?product=UA&SID=2DI1PEg5Ja24IHi95Fc&search_mode=CitingArticles&parentProduct=UA&parentQid=2&parentDoc=4&REFID=292857312&betterCount=6" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">6</a> (from All Databases) </div>
<br>
<div style="display: inline-block" id="links_4">
<nobr><span id="links_openurl_4"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&mode=fastOpenUrl&SID=2DI1PEg5Ja24IHi95Fc&product=UA&qid=2&doc=4&publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_4"> </span><span id="links_doc_del_4"> </span><span id="links_patent_4"> </span></nobr>
</div>
<span style="display: inline" class="ViewAbstract4_text" id="ViewAbstract4_text">
[
<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('4', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract4_img">View abstract</a>
]
</span><span style="display: none" class="HideAbstract4_text" id="HideAbstract4_text">
[
<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('4', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract4_img">Hide abstract</a>
]
</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=4" id="ViewAbstract_Span4">
<!----></span></td></tr><tr id="RECORD_5">
<td valign="top" class="summary_recnum"><input value="5" name="marked_list_candidates" type="checkbox"> 5. <div id="ml_indicator_5">
</div>
<div id="enw_link_5">
</div>
</td><td class="summary_data"><div>
<span class="label">Title: </span><a class="smallV110" href="/full_record.do?product=UA&search_mode=GeneralSearch&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=5" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true">
<value lang_id="">Understanding Core-Collapse Supernovae</value>
</a>
</div>
<div>
<span class="label">Author(s): </span>Hix W. R.; Lentz E. J.; Baird M.; et al.</div>
<div>
<span class="label">Conference:
</span> <span class="data_bold">
<value>10th International Conference on Nucleus-Nucleus Collisions (NN2009)</value>
</span> <span class="label">Location: </span><span class="data_bold">Beijing, PEOPLES R CHINA</span> <span class="label">Date: </span><span class="data_bold">AUG 16-21, 2009</span>
<br>
<span class="label">Sponsor(s): </span><span class="data_bold">China Inst Atom Energy</span>
</div>
<span class="label">Source: </span>NUCLEAR PHYSICS A <span class="label">Volume: </span><span class="data_bold">834</span> <span class="label">Issue: </span><span class="data_bold">1-4</span> <span class="label">Pages: </span><span class="data_bold">602C-607C</span> <span class="label">DOI: </span><span class="data_bold">10.1016/j.nuclphysa.2010.01.104</span> <span class="label">Published: </span><span class="data_bold">MAR 1 2010</span>
<div>
<span class="label">Times Cited: </span><span class="data_bold">0</span> (from All Databases) </div>
<br>
<div style="display: inline-block" id="links_5">
<nobr><span id="links_openurl_5"> <a href="javascript:;" onclick="return open_location('OutboundService.do?action=go&mode=fastOpenUrl&SID=2DI1PEg5Ja24IHi95Fc&product=UA&qid=2&doc=5&publisher_id=Oak_Ridge_National_Lab_UT_Battelle_LLC_open&recordID=','openurl');" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"> <img src="http://sfx.ornl.gov/ornl/sfx.gif" border="0" alt="Context Sensitive Links" title="Context Sensitive Links"> </a> </span><span id="links_full_text_5"> </span><span id="links_doc_del_5"> </span><span id="links_patent_5"> </span></nobr>
</div>
<span style="display: inline" class="ViewAbstract5_text" id="ViewAbstract5_text">
[
<a title="View the abstract" alt="View the abstract" onclick="return hide_show_abstract('5', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="View the abstract" alt="View the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/expand.gif" id="ViewAbstract5_img">View abstract</a>
]
</span><span style="display: none" class="HideAbstract5_text" id="HideAbstract5_text">
[
<a title="Hide the abstract" alt="Hide the abstract" onclick="return hide_show_abstract('5', 'http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif', 'http://images.webofknowledge.com/WOKRS56B5/images/expand.gif', 'View the abstract', 'Hide the abstract');" href="javascript:;" oncontextmenu="javascript:return IsAllowedRightClick(this);" hasautosubmit="true"><img align="absmiddle" title="Hide the abstract" alt="Hide the abstract" src="http://images.webofknowledge.com/WOKRS56B5/images/collapse.gif" id="HideAbstract5_img">Hide abstract</a>
]
</span><span style="display: none" url="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&doc=5" id="ViewAbstract_Span5">
<!----></span></td></tr>
<input type="hidden" name="all_summary_IDs" value=""><input type="hidden" name="viewAbstractUrl" value="http://apps.webofknowledge.com/ViewAbstract.do?product=UA&search_mode=GeneralSearch&viewType=ViewAbstract&qid=2&SID=2DI1PEg5Ja24IHi95Fc&page=1&"> <input type="hidden" name="LinksAreAllowedRightClick" value="full_record.do"> <input type="hidden" name="LinksAreAllowedRightClick" value="CitingArticles.do"> <input type="hidden" name="LinksAreAllowedRightClick" value="CitedPatent.do">
</tbody></table>
我对每行td.summary_data
的内容感兴趣,并尝试使用HTML::TableExtract解析表:
my $te = HTML::TableExtract->new(headers => ["Title"]);
$te->parse($html_string);
# Examine all matching tables
my $count = 1;
foreach my $ts ($te->tables) {
#print "\n";
#print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print "$count\n";
for my $cell (@$row) {
$cell =~ s/^\s+//;
$cell =~ s/\s+\z/;/;
$cell =~ s/\s+/ /g;
}
print join("|", @$row), "\n";
print "\n";
$count++;
}
}
结果:
1
Use of uninitialized value $cell in substitution (s///) at test2.pl line 20.
Use of uninitialized value $cell in substitution (s///) at test2.pl line 21.
Use of uninitialized value $cell in substitution (s///) at test2.pl line 22.
Use of uninitialized value $row in join or string at test2.pl line 24.
2
Title: Extreme Scaling of Production Visualization Software on Diverse Architectures Author(s): Childs Hank; Pugmire David; Ahern Sean; et al. Source: IEEE COMPUTER GRAPHICS AND APPLICATIONS??Volume: 30 ??Issue: 3 ??Pages: 22-31 ??Published: MAY-JUN 2010 Times Cited: 2 (from All Databases);
3
Title: Coupling visualization and data analysis for knowledge discovery from multi-dimensional scientific data Author(s): Ruebel Oliver; Ahern Sean; Bethel E. Wes; et al. Book Author(s): Sloot, PMA; Albada, GDV; Dongarra, J Book Group Author(s): ICCS Conference: International Conference on Computational Science (ICCS) Location: Univ Amsterdam, Amsterdam, NETHERLANDS Date: MAY 31-JUN 02, 2010 Sponsor(s): NWO, Netherlands Org Sci Res; KNAW, Royal Netherlands Acad Arts & Sci; Elsevier B V; Univ Amsterdam Source: ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS??Book Series: Procedia Computer Science ??Volume: 1 ??Issue: 1 ??Pages: 1751-1758 ??DOI: 10.1016/j.procs.2010.04.197 ??Published: 2010 Times Cited: 0 (from All Databases) [ View abstract ] [ Hide abstract ];
如何在此表的每一行中获取td.summary_data
的内容,以便我可以提取我感兴趣的信息?
答案 0 :(得分:3)
你的桌子没有标题。它不是一张桌子。该页面的作者使用表格进行布局。但是,您仍然可以提取所需的信息。只是当表格被布置为可视化格式而不是表格显示数据时,细节HTML::TableExtract将不可用。
#!/usr/bin/env perl
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(file => 'tt.html');
while (my $tag = $parser->get_tag('td')) {
my $class = $tag->get_attr('class');
next unless defined $class;
next unless $class eq 'summary_data';
my $text = $parser->get_text('/td');
# do something with the contents of the table cell here
process_record( \$text );
}
sub process_record {
}
我取出了standard preamble,因为我不确定您的输入编码是什么,但请确保在创建$parser
之前正确设置了流。