有没有办法从页面中的某个<tr>
标记动态获取文字?
e.g。我有一个<tr>
的页面,其值为“a1”。我想只获取此<tr>
标记中的文本,并将其回显到页面中。这有可能吗?
这是HTML:
<html><tr id='ieconn2' >
<td><table width='100%'><tr><td valign='top'><table width='100%'><tr><td><script type="text/javascript"><!--
google_ad_client = "pub-4503439170693445";
/* 300x250, created 7/21/10 */
google_ad_slot = "7608120147";
google_ad_width = 300;
google_ad_height = 250;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br>When Marshall and Lily fear they will never get pregnant, they see a specialist who can hopefully help move the process along. Meanwhile, Robin starts her new job.<br><br><b>Source: </b>CBS
<br> </td></tr><tr><td><b>There are no foreign summaries for this episode:</b> <a href='/edit/shows/3918/episode_foreign_summary/?eid=1065002553&season=6'>Contribute</a></td></tr><tr><td><b>English Recap Available: </b> <a href='/How_I_Met_Your_Mother/episodes/1065002553?show_recap=1'>View Here</a></td></tr></table></td><td valign='top' width='250'><div align='left'>
<img alt='How I Met Your Mother season 6 episode 13' src="http://images.tvrage.com/screencaps/20/3918/1065002553.jpg" width="248" border='0' >
</div><div align='center'><a href='/How_I_Met_Your_Mother/episodes/1065002553?gallery=1'>6 gallery images</a></div></td></tr></table></td></tr><tr>
<td background='/_layout_v3/buttons/title.jpg' height='39' width='631' align='center'>
<table width='100%' cellpadding='0' cellspacing='0' style='margin: 1px 1px 1px 1px;'>
<tr>
<td align='left' style='cursor: pointer;' onclick="SwitchHeader('ieconn3','iehide3','26')" width='90'> <span style='font-size: 15px; font-weight: bold; color: black; padding-left: 8px;' id='iehide3'><img src='/_layout_v3/misc/minus.gif' width='26'></span></td>
<td align='center' style='cursor: pointer;' onclick="SwitchHeader('ieconn3','iehide3','26')" ><h5 class='nospace'>Sponsored Links</h5><a name=''></a></td>
<td align='left' width='90' > </td></tr></table></td>
</tr></html>
我想得到的只是这样的文字:“当马歇尔和莉莉担心他们永远不会怀孕时,他们会看到一位能够帮助推动这一过程的专家。同时,罗宾开始了她的新工作。”
答案 0 :(得分:3)
这个怎么样?
$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile(...);
libxml_clear_errors();
$xpath = new DomXpath($dom);
$nodes = $xpath->query('/html/body/tr/td/table/tr/td/table/tr/td');
foreach ($nodes as $node)
{
echo $node->nodeValue, "\n";
}
答案 1 :(得分:2)
如果我假设你想做什么,你可以做到以下几点:
$url = “http://url.tld”;
$str = file_get_contents($url);
从那里开始只使用php的字符串函数来删除你不喜欢的部分(可能生成一个正则表达式来加速这个过程)。
如果上述方法不起作用,您可以尝试更复杂的功能:
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
答案 2 :(得分:1)
使用queryPath http://querypath.org/。这是一个用于php的jQuery。