从这个html页面开始:
https://www.sports-reference.com/olympics/summer/1896/ATH/
我正在尝试使用以下脚本获取一些信息:
<?php
include_once ('C:\moduli\simple_html_dom.php');
function getTextBetweenTags($url, $tagname) {
$values = array();
$html = file_get_html($url);
foreach($html->find($tagname) as $tag) {
//echo $tag;
foreach($tag->find('a') as $a) {
//echo $a;
$values[] = $a->innertext. '<br>';
//echo $values[0];
}
print_r ($values);
unset($values);
}
//$result=explode("'s",$values[0]);
//array_pop($result);
//return $result;
}
$output = getTextBetweenTags('https://www.sports-reference.com/olympics/summer/1896/ATH/', 'tr class=""');
//echo '<pre>';
?>
我从循环内的print_r数组得到的是以下(仅第一行):
Array ( ) Array ( [0] => Men's 100 metres
[1] => Tom Burke
[2] => Fritz Hofmann
[3] => Alajos Szokoly
[4] => Frank Lane
) Array ( [0] => Men's 400 metres
[1] => Tom Burke
[2] => Herbert Jamison
[3] => Charles Gmelin
) Array ( [0] => Men's 800 metres
[1] => Teddy Flack
[2] => Nándor Dáni
[3] => Dimitrios Golemis
) Array ( [0] => Men's 1,500 metres
[1] => Teddy Flack
[2] => Arthur C. Blake
[3] => Albin Lermusiaux
我想存储在分开的变量中(例如100米):
100 metres
Men
Tom Burke
USA --> (this one taken from "alt" attribute inside html)
Gold --> (static parameter for the first athlete)
然后重置所有并获得第二个循环
100 metres
Men
Fritz Hofmann
GER --> (this one taken from "alt" attribute inside html)
Silver --> (static parameter for the second athlete)
对于最后两名运动员来说,他们都获得了铜奖,所以我想得到:
100 metres
Men
Alajos Szokoly
HUN --> (this one taken from "alt" attribute inside html)
Bronze --> (static parameter for the third athlete)
和
100 metres
Men
Frank Lane
USA --> (this one taken from "alt" attribute inside html)
Bronze --> (static parameter for the fourth athlete)
最后两名运动员是可识别的,因为在html中他们在td align =“left”属性的同一行。
如何获得? 谢谢
答案 0 :(得分:1)
这应该适合你:
function getTextBetweenTags($url, $tagname)
{
$values = array();
$html = file_get_html($url);
foreach($html->find($tagname) as $tag)
{
//echo $tag;
$row = array();
foreach($tag->find('td') as $td)
{
$a_tags = $td->find('a');
if(count($a_tags) ==0)
{
$val ="";
}
elseif(count($a_tags)==1)
{
$val = $a_tags[0]->innertext. '<br>';
}
else
{
$val = array();
foreach($a_tags as $a)
{
$val[] = $a->innertext. '<br>';
}
}
$values[] = $val;
}
print_r ($values);
unset($values);
}
}
以此格式输出数组:
Array
(
[0] => Men's 100 metres<br>
[1] => Tom Burke<br>
[2] => Fritz Hofmann<br>
[3] => Array
(
[0] => Alajos Szokoly<br>
[1] => Frank Lane<br>
)
)
Array
(
[0] => Men's 400 metres<br>
[1] => Tom Burke<br>
[2] => Herbert Jamison<br>
[3] => Charles Gmelin<br>
)
答案 1 :(得分:-2)
我建议你查看这篇文章,因为你真的想从表中提取