我需要使用PHP获取HTML表。我该怎么办?
参考文献:
要解析的表:http://epsachaias.gr/?page_id=221&cat=4&group=151
我试过了:
parseMailingAddyImacroOutput
但是我收到以下错误:
致命错误:未找到班级“团队”。
然后我尝试添加此代码:
<div class="row">
<div class="col-md-8">
<h1>Standings</h1>
<?php
$html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=206');
$dom = new DOMDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$tables = $dom->getElementsByTagName('table');
$trs = $tables->item(1)->getElementsByTagName('tr');
$output = [];
for ($itr = 0; $itr < $trs->length; $itr++)
{
$tds = $trs->item($itr)->getElementsByTagName('td');
if ($tds->length == 22)
{
$output[] = new team($tds);
}
}
var_dump($output);
?>
但我又得到了这个错误。
答案 0 :(得分:1)
您必须知道此解决方案基于该网站的当前结构。
您可以使用DOMDocument
对象来解析您可以通过file_get_contents
获取的HTML,就像在此代码中一样
$html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=151');
$dom = new DOMDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$tables = $dom->getElementsByTagName('table');
$trs = $tables->item(1)->getElementsByTagName('tr');
$output = [];
for ($itr = 0; $itr < $trs->length; $itr++) {
$tds = $trs->item($itr)->getElementsByTagName('td');
if ($tds->length == 22) {
$row = [];
for ($itd = 0; $itd < $tds->length; $itd++) {
$row[] = $tds->item($itd)->textContent;
}
$output[] = $row;
}
}
var_dump($output);
$output
是你的最终数组。
如果要为解析后的数据提供更好的访问权限,则可以使用某些类和对象。例如,您可以为统计组准备一个类,为此团队准备另一个类,如此代码
class stats
{
private $n;
private $i;
private $or;
private $cplus;
private $cminus;
public function __construct($n, $i, $or, $cplus, $cminus)
{
$this->n = $n;
$this->i = $i;
$this->or = $or;
$this->cplus = $cplus;
$this->cminus = $cminus;
}
function getN()
{
return $this->n;
}
function getI()
{
return $this->i;
}
function getOr()
{
return $this->or;
}
function getCplus()
{
return $this->cplus;
}
function getCminus()
{
return $this->cminus;
}
}
class team
{
private $position;
private $name;
private $score;
private $ag;
private $dk;
private $together;
private $within;
private $out;
private $penalties;
public function __construct(DOMNodeList $nodes)
{
if ($nodes->length == 22) {
$this->position = (int) $nodes->item(0)->textContent;
$this->name = $nodes->item(2)->textContent;
$this->score = (int) $nodes->item(3)->textContent;
$this->ag = (int) $nodes->item(4)->textContent;
$this->dk = (int) $nodes->item(5)->textContent;
$this->together = new stats((int)$nodes->item(6)->textContent,
(int)$nodes->item(7)->textContent,
(int)$nodes->item(8)->textContent,
(int)$nodes->item(9)->textContent,
(int)$nodes->item(10)->textContent);
$this->within = new stats( (int)$nodes->item(11)->textContent,
(int)$nodes->item(12)->textContent,
(int)$nodes->item(13)->textContent,
(int)$nodes->item(14)->textContent,
(int)$nodes->item(15)->textContent);
$this->out = new stats( (int)$nodes->item(16)->textContent,
(int)$nodes->item(17)->textContent,
(int)$nodes->item(18)->textContent,
(int)$nodes->item(19)->textContent,
(int)$nodes->item(20)->textContent);
$this->penalties = (int) $nodes->item(21)->textContent;
} else {
throw new Exception("Incorrect input data");
}
}
public function getPosition()
{
return $this->position;
}
public function getName()
{
return $this->name;
}
public function getScore()
{
return $this->score;
}
public function getAg()
{
return $this->ag;
}
public function getDk()
{
return $this->dk;
}
public function getTogether()
{
return $this->together;
}
public function getWithin()
{
return $this->within;
}
public function getOut()
{
return $this->out;
}
public function getPenalties()
{
return $this->penalties;
}
}
因为我猜你只想读取解析数据,所以我只为所有对象属性提供了读操作。如果你想更改它们,那么你可以添加setter或公开属性(并删除不需要的方法)。
要使用已解析的数据预先创建集合,您可以使用此代码
$html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=151');
$dom = new DOMDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$tables = $dom->getElementsByTagName('table');
$trs = $tables->item(1)->getElementsByTagName('tr');
$output = [];
for ($itr = 0; $itr < $trs->length; $itr++) {
$tds = $trs->item($itr)->getElementsByTagName('td');
if ($tds->length == 22) {
$output[] = new team($tds);
}
}
var_dump($output);
在$output
之前你有一个带有值的简单数组 - 现在你有了一个对象集合。例如,如果您希望第二个团队的ΟΜΑΔΑ
获得ΣΥΝΟΛΟ -> Ν
单元格中的值,那么您只需使用此代码
echo "{$output[1]->getName()}: {$output[1]->getTogether()->getN()}";
在你的输出上你会得到
ΆνωΚαστρίτσι:21