我有一个看起来像
的html文件<HTML>
<BODY>
<TABLE width="100%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td height="400" align="right" valign="top" class="text_rail_left"></td>
<td width="100%" align="left" valign="top" class="text_back_color"><table border="0" cellPadding="0" cellSpacing="0" width="100%"><tr>
</tr><tr>
<td width="100%" align="left" align="top"><table width="100%" border="0" cellspacing="2" cellpadding="0">
<tr>
<td align="center" valign="top" class="inside_heading_text">Train Names with Details</td>
</tr> <tr>
<td><b><BR><BR> SORRY !!! No Matching buses Found</b></td></tr>
<tr><td>
</td></tr></table>
<td align="left" valign="top" class="pad_self"><table width="100%" border="0" cellspacing="2" cellpadding="2">
<tr><td align="right" valign="top"> </td>
</tr></table></td>
</tr></table></td>
<td align="left" valign="top" class="text_rail_right"> </td>
</tr>
<tr>
<td width="10" align="left" valign="top"><img src="http://www.indianrail.gov.in/main_text_left_bottom2.gif" alt="" width="8"/></td>
<td width="100%" align="left" valign="top" class="text_rail_bottom"><img src="http://www.indianrail.gov.in/blank.gif" alt="" width="1" height="8" /></td>
<td width="10" align="right" valign="top"> <img src="http://www.indianrail.gov.in/main_text_right_bottom2.gif" alt="" width="8" /></td>
</tr></table><body>
<FONT size=1>No. of Queries : 0839425885
, Server : YAMUNA
, Dated : 15-05-2014 Time:07:15:26 Hrs</font></td></tr></table></td></tr> </table></td></tr></table></td></tr></table></td></tr><tr><td align="left"valign="top"><table width="970" border="0" cellspacing="0" cellpadding="0"><tr> <td width="9" align="left" valign="top"><img src="http://www.indianrail.gov.in/images/footer_upper_lft.gif" alt="" width="9" height="49" /></td><td width="100%%" align="left" valign="top" class="footer_upper"><table width="100%%" border="0" cellspacing="1" cellpadding="0"><tr><td align="center" valign="top" class="main_footer_upper"><a href="../index.html" onclick="resetButton()">Home </a> | <a href="http://www.indianrailways.gov.in/railwayboard/" target="_blank">Ministry of Railways</a> | <a href="../know_Station_Code.html" onclick="resetButton()">Trains between Stations</a> | <a href="../booking_Location.html" onclick="resetButton()">Booking Locations</a> | <a href="http://www.cris.org.in/" target="_blank">CRIS</a> | <a href="../about_Concert.html" onclick="resetButton()">CONCERT</a> | <a href="../advertisement.html" onclick="resetButton()">Advertise with CRIS</a> | <a href="http://www.indianrail.gov.in/images/rail-map.jpg" target="_blank">Railway Map</a> | <a href="../faq.html" onclick="resetButton()">FAQ</a> | <a href="../sitemap.html" onclick="resetButton()">Sitemap</a> | <a href="http://www.trainenquiry.com/Feedback.aspx" target="_blank" onclick="resetButton()">Feedback</a></td></tr><tr><td align="center"valign="top" class="copy_footer" style="padding-top:3px;"><span class="main_footer_copy"><a href="../copyright.html" onclick="resetButton()">Copyright</a></span> © 2010, Centre For Railway Information Systems, Designed and Hosted by CRIS | <span class="main_footer_copy"><a href="../disclaimer.html" onclick="resetButton()">Disclaimer</a></span><br />Best viewed at 1024 x 768 resolution with Internet Explorer 5.0 or Mozila Firefox 3.5 and higher</td></tr> </table></td><td width="9" align="right" valign="top"><img src="http://www.indianrail.gov.in/images/footer_upper_rgt.gif" alt="" width="9" height="49" /></td></tr></table></td></tr></table></td></tr></table><script type="text/javascript">anylinkmenu.init("menuanchorclass")</script>
</BODY>
</HTML>
我想写一个xpath查询来读取字符串
SORRY !!! No Matching buses Found
没有唯一的类用字符串标识类。我尝试了xpath查询
@"//td[@class='inside_heading_text']/tr"
但它似乎无法奏效。
有人能指出我正确的方向吗?我正在使用Objective-C中的ONO库来解析html。
答案 0 :(得分:1)
好吧,这会让你成为&#34; SORRY&#34;的容器。文字
//*[contains(text(),'SORRY')]
我建议使用Firebug的firefinder扩展(在firefox上),以便轻松尝试使用xpath。
答案 1 :(得分:1)
那是你那里的一些丑陋的HTML。
有未闭合的元素,重复的td/@align
属性等。如果你想使用XPath,你将不得不首先清理它。
如果您至少可以手动或自动清理它:
<?xml version="1.0" encoding="utf-8"?>
<HTML>
<BODY>
<TABLE width="100%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td height="400" align="right" valign="top" class="text_rail_left">
</td>
<td width="100%" align="left" valign="top" class="text_back_color">
<table border="0" cellPadding="0" cellSpacing="0" width="100%">
<tr>
</tr>
<tr>
<td width="100%" align="left">
<table width="100%" border="0" cellspacing="2" cellpadding="0">
<tr>
<td align="center" valign="top" class="inside_heading_text">Train Names with Details</td>
</tr>
<tr>
<td>
<b>
<BR/>
<BR/> SORRY !!! No Matching buses Found</b>
</td>
</tr>
<tr>
<td>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
<td align="left" valign="top" class="text_rail_right"></td>
</tr>
</TABLE>
</BODY>
</HTML>
然后这个XPath将在您提到的inside_heading_text
参考点选择“SORRY ...”文本:
//td[@class='inside_heading_text']/../following-sibling::tr[1]/td[1]/b