使用Simple HTML Dom时,避免重复使用

时间:2017-10-02 00:16:52

标签: php json web-scraping simple-html-dom

我正在使用Simple HTML DOM来抓取个人项目的EPG数据。

目前代码擦除每个通道的数据,并将其转储到json文件中,我通过添加我自己的$Channels来过滤所有数据,这会将数据限制为仅我特意请求的项目以及添加我自己的流链接使用以下...

$channels = array(
        "ITV1 London" => "URL 1",
);

我无法想办法避免每个通道的数据在输出的json文件中被复制。因为我需要请求$channels所以我可以过滤最终输出中显示的数据以及我自己的链接被添加到最终输出。

if ($channels[$channel_name]) {
            $channel = array();

完整代码

Screenshot of working code

<?php

// Include the php dom parser
include_once 'simple_html_dom.php';

header('Content-type: application/json');

// Create DOM from URL or file

$curl = curl_init();
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER,1);
curl_setopt($curl, CURLOPT_URL, "http://tv24.co.uk");
$html=curl_exec($curl);
$dom = new simple_html_dom(null, true, true);
$html=$dom->load($html, true, true);

$channels = array(
    "ITV1 London" => "URL 1"
);

$data = array();

foreach($html->find('section div') as $ul)
{
    foreach($ul->find('div.channel-wrapper') as $show) {

        $channel_name = $show->find('h2.name')[0]->plaintext;

        if ($channels[$channel_name]) {
            $channel = array();

            $channel['channel'] =$channel_name ;
            $channel['logo'] = $show->find('span.logo img')[0]->src;
            $channel['thumb'] = explode("'", $show->find('div.program')[0]->style)[1];
            $channel['on-now'] = $show->find('span.title a')[0]->plaintext;
            $channel['on-now-time'] = $show->find('span.time')[0]->plaintext;
            $channel['on-now-description'] = $show->find('span.description')[0]->plaintext;
            $channel['up-next'] = $show->find('span.title a')[1]->plaintext;
            $channel['up-next-time'] = $show->find('span.time')[1]->plaintext;
            $channel['stream'] = $channels[$channel_name];

            $data['data'][] = $channel;
        }

    }

}
echo json_encode($data);

$myFile = "output.json";
$fh = fopen($myFile, 'w') or die("error");
$stringData = json_encode($data);
fwrite($fh, $stringData);
fclose($fh);

?>

1 个答案:

答案 0 :(得分:0)

为通道使用关联数组

JSONObject json = new JSONObject();
json.put("City", city);
json.put("FirstName", city);
StringEntity entity = new StringEntity(json.toString());

你只是用新的条目覆盖上一个条目。一种更聪明的方法是保留一系列频道并跳过你已经完成的处理过程。

$data['data'][$channel_name] = $channel;