Question

我试图使用php阅读RSS源。由于某种原因，它无法读取此内容标记。

<a10:content type="text/xml">...</a10:content>

这是一个项目可能是什么样子的例子

<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
    <channel>
        <title>mMin title</title>
        <description>Some description</description>
        <managingEditor>john.doe@example.com</managingEditor>
        <category>Some category</category>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/1</link>
            <title>Some title 1</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>San diego</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/2</link>
            <title>Some title 2</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Detroit</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com/3</link>
            <title>Some title 3</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Los Angeles</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
    </channel>
</rss>

这是我的代码。

    $url = "http://example.com/RSSFeed";
    $xml = simplexml_load_file($url);

    foreach ($xml->channel as $x) {
        foreach ($x->item as $item) {

            dd($item);
        }
    }

哪个输出

    SimpleXMLElement {#111 ▼
      +"guid": "1"
      +"link": "https://example.com"
      +"title": "Some title"
    }

这是我的预期输出

SimpleXMLElement {#111 ▼
  +"guid": "1"
  +"link": "https://example.com"
  +"title": "Some title"
  +"content" {
    0 => {
        +"Location": "San Diego"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    1 => {
        +"Location": "Detroit"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
    2 => {
        +"Location": "Los Angeles"
        +"PublishedOn": "2016-10-21T11:21:07"
        +"Body": "Lorem ipsum dolar"
        +"JobCountry": "USA"
    }
  }
}

任何人都有解决方案吗？

Answer 1

您应该使用命名空间进行访问。这里我们使用DOMDocument来实现所需的输出。 DOMDocument函数getElementsByTagNameNS，在此我们传递namespace uri及其所需内容。这样就可以达到预期的产量。

如果您更喜欢使用simplexml_load_string，可以查看此信息。 PHP code demo

Try this code snippet here

<?php

ini_set('display_errors', 1);

libxml_use_internal_errors(true);   
$string=<<<HTML
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
    <channel>
        <title>mMin title</title>
        <description>Some description</description>
        <managingEditor>john.doe@example.com</managingEditor>
        <category>Some category</category>
        <item>
            <guid isPermaLink="false">1</guid>
            <link>https://example.com</link>
            <title>Some title</title>
            <a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
            <a10:content type="text/xml">
                <Location>Detroit</Location>
                <PublishedOn>2016-10-21T11:21:07</PublishedOn>
                <Body>Lorem ipsum dolar</Body>
                <JobCountry>USA</JobCountry>
            </a10:content>
        </item>
    </channel>
</rss>
HTML;
$data=array();
$completeData=array();
$domDocument = new DOMDocument();
$domDocument->loadXML($string);
$results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content");
foreach($results as $result)
{
    if($result instanceof DOMElement && $result->tagName=="a10:content")
    {
        foreach($result->childNodes as $node)
        {
            if($node instanceof DOMElement)
            {
                $data[]=$node->nodeValue;
            }
        }
    }
    $completeData[]=$data;
}
print_r($completeData);

Answer 2

这是我的工作解决方案

$xml = file_get_contents("https://example.com/RSSFeed");

$string = str_replace(array("<a10:content","</a10:content>"), array("<content","</content>"), $xml);

$sxe = new \SimpleXMLElement($string);

$jobs = array();

foreach ($sxe as $item) {

     dd($item);

}

Answer 3

首先，不要使用简单的xml，这是废话！使用DOMDocument会更好。

http://php.net/manual/en/class.domdocument.php

<?php

$dom = new DOMDocument();
$dom->loadXML($xml);


$items = $dom->getElementsByTagName('item');
$array = array();

foreach($items as $item)
{
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    $link = $item->getElementsByTagName('link')->item(0)->nodeValue;
    $updated = $item->getElementsByTagName('updated')->item(0)->nodeValue;
    $location = $item->getElementsByTagName('Location')->item(0)->nodeValue;
    $pub = $item->getElementsByTagName('PublishedOn')->item(0)->nodeValue;
    $body = $item->getElementsByTagName('Body')->item(0)->nodeValue;
    $job = $item->getElementsByTagName('JobCountry')->item(0)->nodeValue;

    $array[] = [
        'title' => $title,
        'link' => $link, 
        'updated' => $updated, 
        'Location' => $location, 
        'PublishedOn' => $pub, 
        'Body' => $body, 
        'JobCountry' => $job, 
    ];
}

var_dump($array);

这将是你的意思：

array(7) { ["title"]=> string(12) "Some title 1" ["link"]=> string(21) "https://example.com/1" ["updated"]=> string(25) "2017-05-30T13:20:22+02:00" ["Location"]=> string(9) "San diego" ["PublishedOn"]=> string(19) "2016-10-21T11:21:07" ["Body"]=> string(17) "Lorem ipsum dolar" ["JobCountry"]=> string(3) "USA" }

看到这里！ https://3v4l.org/E0UXJ

现在它可以工作，让我们通过创建一个便利功能来优化它：

function domToArray($item, array $cols)
{
    $array = [];
    foreach ($cols as $col) {
        $val = $item->getElementsByTagName($col)->item(0)->nodeValue;
        $array[$col] = $val;
    }
    return $array;
}

$dom = new DOMDocument();
$dom->loadXML($xml);

$items = $dom->getElementsByTagName('item');
$array = array();

$fields = [
        'title',
        'link', 
        'updated', 
        'Location', 
        'PublishedOn', 
        'Body', 
        'JobCountry', 
    ];

foreach($items as $item)
{
    $array[] = domToArray($item, $fields);
}

var_dump($array);

相同输出，请参阅此处https://3v4l.org/W6HM3

php阅读RSS feed无法读取<a10：content type =“text / xml”>标签

3 个答案: