我试图使用php阅读RSS源。由于某种原因,它无法读取此内容标记。
<a10:content type="text/xml">...</a10:content>
这是一个项目可能是什么样子的例子
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
<channel>
<title>mMin title</title>
<description>Some description</description>
<managingEditor>john.doe@example.com</managingEditor>
<category>Some category</category>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/1</link>
<title>Some title 1</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>San diego</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/2</link>
<title>Some title 2</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Detroit</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com/3</link>
<title>Some title 3</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Los Angeles</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
</channel>
</rss>
这是我的代码。
$url = "http://example.com/RSSFeed";
$xml = simplexml_load_file($url);
foreach ($xml->channel as $x) {
foreach ($x->item as $item) {
dd($item);
}
}
哪个输出
SimpleXMLElement {#111 ▼
+"guid": "1"
+"link": "https://example.com"
+"title": "Some title"
}
这是我的预期输出
SimpleXMLElement {#111 ▼
+"guid": "1"
+"link": "https://example.com"
+"title": "Some title"
+"content" {
0 => {
+"Location": "San Diego"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
1 => {
+"Location": "Detroit"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
2 => {
+"Location": "Los Angeles"
+"PublishedOn": "2016-10-21T11:21:07"
+"Body": "Lorem ipsum dolar"
+"JobCountry": "USA"
}
}
}
任何人都有解决方案吗?
答案 0 :(得分:1)
您应该使用命名空间进行访问。这里我们使用DOMDocument
来实现所需的输出。 DOMDocument
函数getElementsByTagNameNS
,在此我们传递namespace uri
及其所需内容。这样就可以达到预期的产量。
如果您更喜欢使用simplexml_load_string
,可以查看此信息。 PHP code demo
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$string=<<<HTML
<rss version="2.0" xmlns:a10="http://www.w3.org/2005/Atom">
<channel>
<title>mMin title</title>
<description>Some description</description>
<managingEditor>john.doe@example.com</managingEditor>
<category>Some category</category>
<item>
<guid isPermaLink="false">1</guid>
<link>https://example.com</link>
<title>Some title</title>
<a10:updated>2017-05-30T13:20:22+02:00</a10:updated>
<a10:content type="text/xml">
<Location>Detroit</Location>
<PublishedOn>2016-10-21T11:21:07</PublishedOn>
<Body>Lorem ipsum dolar</Body>
<JobCountry>USA</JobCountry>
</a10:content>
</item>
</channel>
</rss>
HTML;
$data=array();
$completeData=array();
$domDocument = new DOMDocument();
$domDocument->loadXML($string);
$results=$domDocument->getElementsByTagNameNS("http://www.w3.org/2005/Atom", "content");
foreach($results as $result)
{
if($result instanceof DOMElement && $result->tagName=="a10:content")
{
foreach($result->childNodes as $node)
{
if($node instanceof DOMElement)
{
$data[]=$node->nodeValue;
}
}
}
$completeData[]=$data;
}
print_r($completeData);
答案 1 :(得分:1)
这是我的工作解决方案
$xml = file_get_contents("https://example.com/RSSFeed");
$string = str_replace(array("<a10:content","</a10:content>"), array("<content","</content>"), $xml);
$sxe = new \SimpleXMLElement($string);
$jobs = array();
foreach ($sxe as $item) {
dd($item);
}
答案 2 :(得分:0)
首先,不要使用简单的xml,这是废话!使用DOMDocument会更好。
http://php.net/manual/en/class.domdocument.php
<?php
$dom = new DOMDocument();
$dom->loadXML($xml);
$items = $dom->getElementsByTagName('item');
$array = array();
foreach($items as $item)
{
$title = $item->getElementsByTagName('title')->item(0)->nodeValue;
$link = $item->getElementsByTagName('link')->item(0)->nodeValue;
$updated = $item->getElementsByTagName('updated')->item(0)->nodeValue;
$location = $item->getElementsByTagName('Location')->item(0)->nodeValue;
$pub = $item->getElementsByTagName('PublishedOn')->item(0)->nodeValue;
$body = $item->getElementsByTagName('Body')->item(0)->nodeValue;
$job = $item->getElementsByTagName('JobCountry')->item(0)->nodeValue;
$array[] = [
'title' => $title,
'link' => $link,
'updated' => $updated,
'Location' => $location,
'PublishedOn' => $pub,
'Body' => $body,
'JobCountry' => $job,
];
}
var_dump($array);
这将是你的意思:
array(7) { ["title"]=> string(12) "Some title 1" ["link"]=> string(21) "https://example.com/1" ["updated"]=> string(25) "2017-05-30T13:20:22+02:00" ["Location"]=> string(9) "San diego" ["PublishedOn"]=> string(19) "2016-10-21T11:21:07" ["Body"]=> string(17) "Lorem ipsum dolar" ["JobCountry"]=> string(3) "USA" }
看到这里! https://3v4l.org/E0UXJ
现在它可以工作,让我们通过创建一个便利功能来优化它:
function domToArray($item, array $cols)
{
$array = [];
foreach ($cols as $col) {
$val = $item->getElementsByTagName($col)->item(0)->nodeValue;
$array[$col] = $val;
}
return $array;
}
$dom = new DOMDocument();
$dom->loadXML($xml);
$items = $dom->getElementsByTagName('item');
$array = array();
$fields = [
'title',
'link',
'updated',
'Location',
'PublishedOn',
'Body',
'JobCountry',
];
foreach($items as $item)
{
$array[] = domToArray($item, $fields);
}
var_dump($array);
相同输出,请参阅此处https://3v4l.org/W6HM3