我不明白如何在标签之后获取文字,例如(<b></b> Text
)。
请参阅下图。我正在访问此网站并获取此HTML
我想从上面的图片细节中创建这样的array():
array(
'release_year'=> 2009,
'genre' => 'Drama,Fantasy,Horror',
'description' => 'etc etc etc',
'imdb' => 'link of imdb',
'total_episode'=> '28 episode',
'latest_episode_title'=> 'title',
'latest_episode_link' => 'link',
'latest_episode_with_link_title'=> 'title',
'latest_episode_with_link_link' => 'link',
);
我已成功获取标记<b></b>
下的文字,但我不知道如何在HTML中显示<b>
标记后获取文字。请查看它和我的PHP代码和结果,请请解决我的问题。我提前非常感谢你。
以下是上图中的HTML:
<div class="show-summary">
<table border="0" style="padding:3px">
<tbody>
<tr>
<td style="padding:3px">
<a href="/serie/the_vampire_diaries">
<img src="http://static1.watchseries.ag/90/1/The_Vampire_Diaries-18597.JPEG" alt="Watch Series - The Vampire Diaries" title="Watch Series - The Vampire Diaries" height="120px" width="85px">
</a>
</td>
<td valign="top" style="padding:3px">
<p>
<b>Release Year: </b>
2009<br>
<b>Genre: <a href="/genres/Drama">Drama</a>, <a href="/genres/Fantasy">Fantasy</a>, <a href="/genres/Horror">Horror</a></b>
<br>
<b>External Links: </b>
<a href="http://www.imdb.com/title/tt1405406/" target="_blank">IMDB</a>
<br>
<b>No. of episodes: </b>
128 episodes <br>
<b>Latest Episode: </b>
<a title="Watch The Vampire Diaries Latest Episode (The Vampire Diaries Season 6 Episode 16)" href="/episode/the_vampire_diaries_s6_e16.html">Season 6 Episode 16 The Downward Spiral (26/02/2015)</a>
<br>
<b>Latest Episode With Links: </b>
<a title="Watch The Vampire Diaries Latest Episode (The Vampire Diaries Season 6 Episode 11)" href="/episode/the_vampire_diaries_s6_e11.html">Season 6 Episode 11 Woke Up With a Monster (22/01/2015)</a>
<br>
</p>
<div style="float: left; height: 30px; overflow: hidden; width: 100px;">
<div class="fb-like fb_iframe_widget" data-href="http://watchseries.ag/serie/the_vampire_diaries" data-send="false" data-layout="button_count" data-show-faces="false" fb-xfbml-state="rendered" fb-iframe-plugin-query="app_id=434603673340441&href=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fthe_vampire_diaries&layout=button_count&locale=en_US&sdk=joey&send=false&show_faces=false">
<span style="vertical-align: bottom; width: 79px; height: 20px;">
<iframe name="fbc5b3f58" width="1000px" height="1000px" frameborder="0" allowtransparency="true" scrolling="no" title="fb:like Facebook Social Plugin" src="http://www.facebook.com/plugins/like.php?app_id=434603673340441&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2F7r8gQb8MIqE.js%3Fversion%3D41%23cb%3Df314058a5%26domain%3Dwatchseries.ag%26origin%3Dhttp%253A%252F%252Fwatchseries.ag%252Ff5fff1c%26relation%3Dparent.parent&href=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fthe_vampire_diaries&layout=button_count&locale=en_US&sdk=joey&send=false&show_faces=false" style="border: none; visibility: visible; width: 79px; height: 20px;" class="" __idm_id__="824321"></iframe>
</span>
</div>
</div>
<iframe id="twitter-widget-1" scrolling="no" frameborder="0" allowtransparency="true" src="http://platform.twitter.com/widgets/tweet_button.b68aed79dd9ad79554bcd8c9141c94c8.en.html#_=1422079075304&count=horizontal&dnt=false&id=twitter-widget-1&lang=en&original_referer=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fthe_vampire_diaries&size=m&text=Watch%20The%20Vampire%20Diaries%20Serie%20Online%20-%20Watch%20Series&url=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fthe_vampire_diaries" class="twitter-share-button twitter-tweet-button twitter-share-button twitter-count-horizontal" title="Twitter Tweet Button" data-twttr-rendered="true" style="width: 107px; height: 20px;"></iframe>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
<br clear="all">
<b>Description :</b>
The vampire brothers Damon and Stefan Salvatore, eternal adolescents, having been leading 'normal' lives, hiding their bloodthirsty condition, for centuries, moving on before their non-aging is noticed.
<span id="plot_mored"> They are back in the Virginia town where they became vampires. Stefan is noble, denying himself blood to avoid killing, and tries to control his evil brother Damon. Stefan falls in love with schoolgirl Elena, whose best friend is a witch, like her grandma.</span>
<a onclick="return showMoreContent('plot_mored');" class="small dark" href="#" id="more" style="display: none;">[+]more</a>
<br>
<p></p>
</td>
</tr>
</tbody>
</table>
</div>
这是我的PHP代码:
$html = new simple_html_dom();
$html->load_file("LINK");
foreach($html->find('div.show-summary table tbody tr') as $rowz){
foreach($rowz->find('p') as $p){
foreach($p->find('b') as $b){
echo $b->innertext.'<br/>';
}
}
}
通过运行以上代码,我得到以下结果:
发行年份:
类型:戏剧,幻想,恐怖
外部链接:
没有。的剧集:
最新一集:
链接的最新一集:
描述:
我想创建一个上面图像细节的数组。
答案 0 :(得分:0)
您是否尝试在搜索中添加标签p和b:
$html->find('div.show-summary table tbody tr p b')
这只是一种方法而且还不完整,但会给你一个想法。 要获得年度发布数据,它有点棘手,应该有更好的方法,但它的工作原理:
$html = new simple_html_dom();
$html->load_file('yourhtmlfile.html');
# set the 'mapping': map the search to the field you need
$map = array(
array(
'query'=>'div.show-summary table tbody tr p',
'nodeIndex'=>0,
'attribute'=>'',
'method'=>'innertext',
'extract_string'=>'',
'get_string_between'=>array(
'start'=>'<b>Release Year: </b>',
'end'=>'<br>',
),
'field'=>'Release Year',
),
array(
'query'=>'div.show-summary table tbody tr p b',
'nodeIndex'=>1,
'attribute'=>'',
'method'=>'plaintext',
'extract_string'=>'Genre: ',
'get_string_between'=>'',
'field'=>'Genre',
),
array(
'query'=>'div.show-summary table tbody tr p a',
'nodeIndex'=>4,
'attribute'=>'title',
'method'=>'',
'extract_string'=>'',
'get_string_between'=>'',
'field'=>'Latest Episode title',
),
array(
'query'=>'div.show-summary table tbody tr p a',
'nodeIndex'=>4,
'attribute'=>'href',
'method'=>'',
'extract_string'=>'',
'get_string_between'=>'',
'field'=>'Latest Episode link',
),
);
# the resulting array with fields values
$fieldsResult = array();
foreach($map as $search)
{
# get the search result node
$node = $html->find($search['query'],$search['nodeIndex']);
# get the node attributes
$node_attributes = $node->attr;
# attribute set in the map? get it.
$content = $search['attribute']!=''
? $node_attributes[$search['attribute']]
: '';
# method set in the map? get it
$content = $search['method']!=''
? $node->{$search['method']}
: $content;
# string to be cleaned? extract it
$result = $search['extract_string']!=''
? str_replace($search['extract_string'], '', $content)
: $content;
# get content from within to string marks
if($search['get_string_between']!=0)
{
$result = trim($result);
$init_length = strlen($search['get_string_between']['start']);
$end_length = strlen($search['get_string_between']['end']);
$init_pos = strpos($result, $search['get_string_between']['start']);
$end_pos = strpos($result, $search['get_string_between']['end']);
$substring_start = $init_pos + $init_length;
$substring = trim(substr($result, $substring_start, $end_pos));
$result = str_replace($search['get_string_between']['end'], '', $substring);
}
# final result
$fieldsResult[$search['field']] = $result;
}
var_dump($fieldsResult);
////////////
// OUTPUT //
////////////
array (size=4)
'Release Year' => string '2009' (length=4)
'Genre' => string 'Drama, Fantasy, Horror' (length=22)
'Latest Episode title' => string 'Watch The Vampire Diaries Latest Episode (The Vampire Diaries Season 6 Episode 16)' (length=82)
'Latest Episode link' => string '/episode/the_vampire_diaries_s6_e16.html' (length=40)
答案 1 :(得分:0)
这可能不是你想要的,如果文件发生了很大变化,但如果你做了一些事情
$html = new simple_html_dom();
$html->load_file("LINK");
foreach($html->find('div.show-summary table tbody tr') as $rowz){
foreach($rowz->find('p') as $p){
$matches = explode('<br>',$p->innertext);
foreach ($matches as $entry) {
preg_match('/<b>(.*)</b>(.*)/i', $entry, $stuff);
echo "{$stuff[1]} => $stuff[2]";
}
}
}
抱歉,你可能需要清理/摆弄它才能得到你想要的效果。并检查错误/未定义的条目....
答案 2 :(得分:0)
Hello Every One现在我有一个完整的解决方案我已经做了很多研究代码,它会做我想要的这个函数检查出来
<?php
function do_html_array($td,$dlm='<br>'){
if(!empty($td)){
$td = html_entity_decode($td);
$td = preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $td);
$html_array = explode($dlm,$td);
$html_key_array = array();
foreach($html_array as $key=>$html){
$html = explode(':',trim(strip_tags($html)));
if(trim($html[0])!=''){
if(count($html)<1) $html[1] = '';
if(strtolower(trim($html[0]))=='description') $html[1] = str_ireplace('[+]more','',$html[1]);
$html_key_array[strtolower(trim($html[0]))] = trim($html[1]);
switch(trim(strtolower($html[0]))){
case'external links':
preg_match_all('~<a\s+.*?</a>~is',$html_array[$key],$html_key_array['imdb_link']);
break;
case'genre':
preg_match_all('~<a\s+.*?</a>~is',$html_array[$key],$html_key_array['genre_link']);
break;
// further define here...
}
}
}
return $html_key_array;
}
return false;
}
$td = '<td valign="top" style="padding:3px"><p><b>Release Year: </b>2007<br><b>Genre: <a href="/genres/Comedy">Comedy</a></b><br><b>External Links: </b> <a target="_blank" href="http://www.imdb.com/title/tt0898266/">IMDB</a> <br><b>No. of episodes: </b> 178 episodes <br><b>Latest Episode: </b> <a href="/episode/big_bang_theory_s8_e16.html" title="Watch The Big Bang Theory Latest Episode (The Big Bang Theory Season 8 Episode 16)">Season 8 Episode 16 The Intimacy Acceleration (01/01/1970)</a><br><b>Latest Episode With Links: </b> <a href="/episode/big_bang_theory_s8_e13.html" title="Watch The Big Bang Theory Latest Episode (The Big Bang Theory Season 8 Episode 13)">Season 8 Episode 13 The Anxiety Optimization (15/01/2015)</a><br></p><div style="float: left; height: 30px; overflow: hidden; width: 100px;"><div data-show-faces="false" data-layout="button_count" data-send="false" data-href="http://watchseries.ag/serie/big_bang_theory" class="fb-like fb_iframe_widget" fb-xfbml-state="rendered" fb-iframe-plugin-query="app_id=434603673340441&href=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fbig_bang_theory&layout=button_count&locale=en_US&sdk=joey&send=false&show_faces=false"><span style="vertical-align: bottom; width: 80px; height: 20px;"><iframe width="1000px" height="1000px" frameborder="0" name="f225e71df2e6d02" allowtransparency="true" scrolling="no" title="fb:like Facebook Social Plugin" style="border: medium none; visibility: visible; width: 80px; height: 20px;" src="http://www.facebook.com/plugins/like.php?app_id=434603673340441&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FDU1Ia251o0y.js%3Fversion%3D41%23cb%3Df1f47ad29892336%26domain%3Dwatchseries.ag%26origin%3Dhttp%253A%252F%252Fwatchseries.ag%252Ff18c568fa0d51e4%26relation%3Dparent.parent&href=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fbig_bang_theory&layout=button_count&locale=en_US&sdk=joey&send=false&show_faces=false" class=""></iframe></span></div></div><iframe frameborder="0" id="twitter-widget-1" scrolling="no" allowtransparency="true" src="http://platform.twitter.com/widgets/tweet_button.67ae45a68af44ab435dd5797206058d3.en.html#_=1422780550826&count=horizontal&dnt=false&id=twitter-widget-1&lang=en&original_referer=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fbig_bang_theory&size=m&text=Watch%20The%20Big%20Bang%20Theory%20Serie%20Online%20-%20Watch%20Series&url=http%3A%2F%2Fwatchseries.ag%2Fserie%2Fbig_bang_theory" class="twitter-share-button twitter-tweet-button twitter-share-button twitter-count-horizontal" title="Twitter Tweet Button" data-twttr-rendered="true" style="width: 109px; height: 20px;"></iframe><script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?\'http\':\'https\';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+\'://platform.twitter.com/widgets.js\';fjs.parentNode.insertBefore(js,fjs);}}(document, \'script\', \'twitter-wjs\');</script><br clear="all"><b>Description :</b> A woman who moves into an apartment across the hall from two brilliant but socially awkward physicists shows them how little they know about life outside of the laboratory.<br><p></p></td>';
$html_array = do_html_array($td);
if($html_array){
foreach($html_array as $key=>$value){
if(is_array($value)){
echo "<strong>$key</strong>:";
foreach($value[0] as $link){
echo "$link , ";
}
echo "<br>--------------------------------<br>";
}else{
echo "<strong>$key</strong>: $value";
echo "<br>--------------------------------<br>";
}
}
}
?>
我的上面的函数获取所有文本并将它们保存在数组键值对中:)