从网页获取特定文本

时间:2013-04-08 09:17:00

标签: php

我在另一页Test1

上有此页面test

我运行这个PHP代码来从test1获取一些代码。

<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");

if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            echo $node->nodeValue. "\n";
        }
    }
}
?>

结果就是这个

OnPlay('sop://broker.sopcast.com:3912/120704 cod ', ' eu - Nr.1 in tv ! ')
OnPlay('sop://broker.sopcast.com:3912/140601 cod ', ' eu - Nr.1 in tv ! ')     
OnPlay('sop://broker.sopcast.com:3912/124589 cod ', ' eu - Nr.1 tv') 
OnPlay('sop://broker.sopcast.com:3912/589994 cod ', ' eu - tv ') 
OnPlay('sop://broker.sopcast.com:3912/ cod ', ' eu - tv ')

但我只需要所有这些数据:`sop://broker.sopcast.com:3912/140601

所有这些。

如何摆脱额外的文字或如何获取文字(sop://broker.sopcast.com:3912/140601,sop://broker.sopcast.com:3912/120704)

2 个答案:

答案 0 :(得分:0)

我认为您可能需要对结果OnClick事件处理程序文本执行一些字符串操作。

<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");
$value_text = array();
$index = 0;
if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            value_text[$index++] = getReuiredValue($node->nodeValue);
        }
    }
    //value_text will contain all required values as array
    print_r($value_text);
}


    function getReuiredValue($on_play)
    {
   $pos = strpos($on_play, 'cod ');
   //following call will parse the OnPlay string and get the required value out of string
   $updated_on_play = substr($on_play, 8, (strlen($on_play) - (strlen($on_play) - $pos) - 8));
   $updated_on_play = trim($updated_on_play);
   return  $updated_on_play;
   }
?>

答案 1 :(得分:0)

如果字符串的格式始终如此,您只需使用explode即可获取sop://网址。

<?php

header('Content-Type: text/plain; charset=UTF-8');


libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");

if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            echo $node->nodeValue. "\n";
            $content = $node->nodeValue;
            $content = explode("'", $content, 3);
            $content = explode(" ", $content[1], 2);
            $sop = $content[0];
            unset($content);
            var_dump($sop);
        }
    }
}
?>