Question

我使用的PHP脚本检索RSS并将其作为JSON发送。

它始于

$feed = new DOMDocument();
$feed->load($_GET['url']);

我使用的Feed看起来像（网址：RSS FEED）

并且有一个不错的。

我看到我无法访问这些数据。事实上，当我的var_dump（$饲料）; 我没有看到任何封闭的概念，也没有https://MYURL.COM/MYPATH

的概念

所以问题：为什么以及如何： - ）

谢谢！

修改

HERE是完整的脚本和var转储内容：

<?php
header('Content-Type: application/json');
$feed = new DOMDocument();
$feed->load($_GET['url']);

$json = array();

$json['title'] =  $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$json['description'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$json['link'] =  $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('link')->item(0)->firstChild->nodeValue;


$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['items'] = array();
$i = 0;
foreach($items as $item) {
   $json['items'][$i]['title'] = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
   $json['items'][$i]['description'] = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
   $json['items'][$i]['pubdate'] = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
   $json['items'][$i]['guid'] = $item->getElementsByTagName('guid')->item(0)->firstChild->nodeValue;
   $json['items'][$i]['link'] = $item->getElementsByTagName('link')->item(0)->firstChild->nodeValue;
   //$json['items'][$i]['url'] = $item->getELementsByTagName('nodeValue')->item(0)->firstChild->getAttribute('url');

    $i++;
}

echo json_encode($json);
?>

当https://www.dealabs.com/rss/new.xml url在params中传递时，

$ Feed的VAR DUMP（此处太长了）：pastebin

Answer 1

正如快速演示如何使用DOMDocument并从XML文档中获取数据......

$feed = new DOMDocument();
$feed->load($_GET['url']);

$xpath=new DOMXPath($feed);

foreach ( $xpath->query("//enclosure") as $enclosure ) {
    echo "Element=".$feed->saveXML($enclosure)."\n";
    var_dump($enclosure);
    echo "Url=".$enclosure->getAttribute("url")."\n";
}

正如您所看到的，我正在使用XPath从源中获取磁盘阵列元素，并首先打印出XML（您必须使用文档saveXML方法输出XML）。下一行显示var_dump为您提供的内容 - 基本上是许多支持DOM结构的内部内容。最后，打印url属性的值。

使用我可以从您的示例数据中获取的内容（总是更好地包含数据而不是图像）。输出给出......

Element=<enclosure url="https://something/url"/>
/home/nigel/workspace/PHPTest/XML/test2.php:13:
class DOMElement#3 (18) {
  public $tagName =>
  string(9) "enclosure"
  public $schemaTypeInfo =>
  NULL
  public $nodeName =>
  string(9) "enclosure"
  public $nodeValue =>
  string(0) ""
  public $nodeType =>
  int(1)
  public $parentNode =>
  string(22) "(object value omitted)"
  public $childNodes =>
  string(22) "(object value omitted)"
  public $firstChild =>
  NULL
  public $lastChild =>
  NULL
  public $previousSibling =>
  string(22) "(object value omitted)"
  public $nextSibling =>
  string(22) "(object value omitted)"
  public $attributes =>
  string(22) "(object value omitted)"
  public $ownerDocument =>
  string(22) "(object value omitted)"
  public $namespaceURI =>
  NULL
  public $prefix =>
  string(0) ""
  public $localName =>
  string(9) "enclosure"
  public $baseURI =>
  string(40) "/home/nigel/workspace/PHPTest/XML/t1.xml"
  public $textContent =>
  string(0) ""
}
Url=https://something/url

Answer 2

No doubt you now have this worked out by now but if not perhaps the following might be useful. Given the url below and a couple of small helper functions getchild and getvalue you can simply iterate through each item in the XML/RSS feed like this - choosing whatever attributes from enclosure you wish to capture. In reality you would more than likely want to make the helper functions more robust but you should get the idea.

define('BR','<br />');
$url='https://www.dealabs.com/rss/new.xml';

function getchild( $node,$index ){
    $child=$node->childNodes->item( $index );
    if( !$child )throw new Exception( __FUNCTION__ .' -> Unable to find child node',$index);
    return $child;
}
function getvalue( $node ){
    return $node->nodeValue;
}

try{

    libxml_use_internal_errors( true );
    $dom=new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->validateOnParse = false;
    $dom->standalone=true;
    $dom->strictErrorChecking=false;
    $dom->substituteEntities=true;
    $dom->recover=true;
    $dom->formatOutput=false;
    $dom->load( $url );

    $errors = libxml_get_errors();
    libxml_clear_errors();


    if( !empty( $errors ) ) {
        throw new Exception( implode( PHP_EOL, $errors ) );
    }

    $items=$dom->getElementsByTagName('item');

    if( !empty( $items ) ){

        foreach( $items as $index => $item ){
            try{

                $title=getvalue( getchild( $item, 0 ) );
                $link=getvalue( getchild( $item,1 ) );
                $description=getvalue( getchild( $item,2 ) );
                $content=getvalue( getchild( $item,3 ) );
                $guid=getvalue( getchild( $item,4 ) );
                $pubDate=getvalue( getchild( $item,5 ) );
                $enclosure=getchild( $item, 6 );

                $x=getvalue( getchild( $item, 69 ) );

                /* elected to get the url only but same method for other attributes */
                echo $enclosure->getAttribute('url').BR;

            }catch( Exception $e ){
                printf( 'Caught Exception: %s @ index %d<br />', $e->getMessage(), $e->getCode() );
                continue;
            }
        }
    }
    $dom=null;
}catch( Exception $e ){
    printf( 'Caught Exception -> Trace:%s Message:%s Code:%d', $e->getTraceAsString(), $e->getMessage(), $e->getCode() );
}

DomDocument没有看到<enclosure url =“”>

2 个答案: