我正在尝试获取每个容器href
的{{1}},src
和电影名称。
item-holder-account
结果应该是一个数组:
<div id="item_container">
<div class="item-holder-account">
<a href="movie1.html">
<span class="rollover"></span>
<img src="movie1.png" alt="">
<h2 class="list-item-title">Movie 1 <span class="paragraph-end"></span></h2>
</a>
</div>
<div class="item-holder-account">
<a href="movie2.html">
<span class="rollover"></span>
<img src="movie2.png" alt="">
<h2 class="list-item-title">Movie 2 <span class="paragraph-end"></span></h2>
</a>
</div>
<div class="item-holder-account">
<a href="movie3.html">
<span class="rollover"></span>
<img src="movie3.png" alt="">
<h2 class="list-item-title">Movie 3 <span class="paragraph-end"></span></h2>
</a>
</div>
</div>
我已经尝试但是我被困在这里:
movie1.html
movie2.png
Movie 1
movie2.html
movie2.png
Movie 2
movie3.html
movie3.png
Movie 3
我该如何解决这个问题?
答案 0 :(得分:1)
我会选择domxpath。根据您的示例,您可以查询具有div
类的所有item-holder-account
,然后继续提取必要的数据。以下脚本应该执行您想要的操作:
<?php
$file = $argv[1];
$html = file_get_contents($file);
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$data = [];
foreach($xpath->query('//div[@class="item-holder-account"]') as $div) {
foreach($div->getElementsByTagName('a') as $item) {
$data[] = [
'href' => $item->getAttribute('href'),
'img' => $item->getElementsByTagName('img')->item(0)->getAttribute('src'),
'text' => $item->getElementsByTagName('h2')->item(0)->nodeValue,
];
}
}
print_r($data);
结果:
Array
(
[0] => Array
(
[href] => movie1.html
[img] => movie1.png
[text] => Movie 1
)
[1] => Array
(
[href] => movie2.html
[img] => movie2.png
[text] => Movie 2
)
[2] => Array
(
[href] => movie3.html
[img] => movie3.png
[text] => Movie 3
)
)
答案 1 :(得分:0)
您可以使用像PHP Simple HTML DOM Parser
这样的DOM解析器<?php
$str = '<div id="item_container">
<div class="item-holder-account">
<a href="movie1.html"> <span class="rollover"></span>
<img src="movie1.png" alt="">
<h2 class="list-item-title">Movie 1 <span class="paragraph-end"></span></h2>
</a>
</div>
<div class="item-holder-account">
<a href="movie2.html"> <span class="rollover"></span>
<img src="movie2.png" alt="">
<h2 class="list-item-title">Movie 2 <span class="paragraph-end"></span></h2>
</a>
</div>
<div class="item-holder-account">
<a href="movie3.html"> <span class="rollover"></span>
<img src="movie3.png" alt="">
<h2 class="list-item-title">Movie 3 <span class="paragraph-end"></span></h2>
</a>
</div>
</div>';
require 'simple_html_dom.php';
$html = str_get_html($str);
$arr = array();
foreach($html->find('.item-holder-account') as $element){
$subarr = array();
foreach($element->find('a') as $a){
$subarr[] = $a->href;
}
foreach($element->find('img') as $a){
$subarr[] = $a->src;
}
foreach($element->find('h2') as $a){
$subarr[] = $a->innertext;
}
$arr[] = $subarr;
}
echo '<pre>';
var_dump($arr);
echo '</pre>';
/* output
array(3) {
[0]=>
array(3) {
[0]=>
string(11) "movie1.html"
[1]=>
string(10) "movie1.png"
[2]=>
string(43) "Movie 1 "
}
[1]=>
array(3) {
[0]=>
string(11) "movie2.html"
[1]=>
string(10) "movie2.png"
[2]=>
string(43) "Movie 2 "
}
[2]=>
array(3) {
[0]=>
string(11) "movie3.html"
[1]=>
string(10) "movie3.png"
[2]=>
string(43) "Movie 3 "
}
}
*/