需要帮助从Dom / Regex的PHP页面中提取内容

时间:2018-01-23 20:16:21

标签: php regex dom

到目前为止,这是我的代码:

<?php
$start = date("d/m/y", strtotime('today'));
$end = date("d/m/y", strtotime('tomorrow'));

$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers/LineUpAdvanceSearch.aspx?text=&channel=506&genre=-1&ageRating=-1&publishYear=-1&productionCountry=-1&startDate=$start&endDate=$end&pageSize=1";
$data = file_get_contents($url, false, $context);

$re = '/LineUpId=(.+\d)/';
preg_match($re, $data, $matches);

$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$context = stream_context_create($opts);
$url = "http://www.hot.net.il/PageHandlers//LineUpDetails.aspx?lcid=1037&luid=$matches[1]";
$data = file_get_contents($url, false, $context);
echo $data;
?> 

我正在尝试为单频道和当前节目制作电视指南,

HTML页面的一部分:

<div class="GuideLineUpDetailsCenter">
    <a class="LineUpbold">Name of the Show</a>
    <br>
    <div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
    <br>
    <div class="show">Information about the program</div>
    <br>
    <div class="LineUpbold">+14</div>
    <br>
</div>

我想提取内容并执行以下操作:

echo $ LineUpbold;

echo $ LineUpDetailsTime;

echo $ show;

echo $ LineUpbold;

1 个答案:

答案 0 :(得分:1)

使用DOM解析器和相应的xpath查询:

<?php

$data = <<<DATA
<div class="GuideLineUpDetailsCenter">
    <a class="LineUpbold">Name of the Show</a>
    <br>
    <div class="LineUpDetailsTime">2018 22:45 - 23:30</div>
    <br>
    <div class="show">Information about the program</div>
    <br>
    <div class="LineUpbold">+14</div>
    <br>
</div>
DATA;

# set up the dom
$dom = new DOMDocument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

# set up the xpath
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//div[@class = 'GuideLineUpDetailsCenter']") as $container) {
    $name = $xpath->query("a[@class = 'LineUpbold']/text()", $container)->item(0);
    echo $name->nodeValue;

    $details = $xpath->query("div[@class = 'LineUpDetailsTime']/text()", $container)->item(0);
    echo $details->nodeValue;

    # and so on...

}

代码加载您的字符串,使用类divs搜索GuideLineUpDetailsCenter,循环遍历它们并尝试在每个div内找到合适的子项。