我正在使用DOM抓取HTML以从外部网站创建自定义RSS源。我在名为$jobs
的数组中拥有所需的所有值。我可以打印这样的值:
function jobscrape($title, $link, $root, $description, $job_location) {
$jobs = array();
$html = file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)) {
$doc->loadHTML($html);
libxml_clear_errors(); // remove errors for yucky html
$xpath = new DOMXPath($doc);
$row = $xpath->query($job_location);
if ($row->length > 0) {
foreach ($row as $job) {
$jobs['title'] = $job->nodeValue;
$jobs['description'] = "This is a description";
$jobs['link'] = $job->getAttribute('href');
}
}
else { echo "row is less than 0";}
}
else { echo "this is empty";}
}
}
但是,我需要这种格式的数组,其中每个'子数组'是三个变量的一次迭代(我这里只使用三个作为例子):
$entries = array(
array(
"title" => "My first test entry",
"description" => "This is the first article's description",
"link" => "http://leolabs.org/my-first-article-url"
),
array(
"title" => "My second test entry",
"description" => "This is the second article's description",
"link" => "http://leolabs.org/my-second-article-url"
),
array(
"title" => "My third test entry",
"description" => "This is the third article's description",
"link" => "http://leolabs.org/my-third-article-url"
)
);
更新
在尝试Durgesh的解决方案后,这是我的新代码:
function jobscrape($title, $link, $root, $description, $job_location) {
header("Content-Type: application/rss+xml; charset=UTF-8");
$xml = new SimpleXMLElement('<rss/>');
$xml->addAttribute("version", "2.0");
$channel = $xml->addChild("channel");
$channel->addChild("title", $title);
$channel->addChild("link", $link);
$channel->addChild("description", "This is a description");
$channel->addChild("language", "en-us");
$html = file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)) {
$doc->loadHTML($html);
libxml_clear_errors(); // remove errors for yucky html
$xpath = new DOMXPath($doc);
$row = $xpath->query($job_location);
if ($row->length > 0) {
foreach ($row as $job) {
$jobs = array();
$entries = array();
$jobs['title'] = $job->nodeValue;
$jobs['description'] = "This is a description";
$jobs['link'] = $job->getAttribute('href');
array_push($entries,$jobs);
foreach ($entries as $entry) {
$item = $channel->addChild("item");
$item->addChild("title", $entry['title']);
$item->addChild("link", $entry['link']);
$item->addChild("description", $entry['description']);
}
echo $xml->asXML();
}
}
else { echo "row is less than 0";}
}
else {
echo "this is empty";
}
}
但是,我的RSS格式不正确,将以下内容添加到每个<item>
,而不是仅添加到标题中:
<?xml version="1.0"?>
<rss version="2.0"><channel><title>Media Muppet</title><link>http://www.mediargh.com/jobs</link><description>This is a description</description><language>en-us</language>
答案 0 :(得分:2)
如果您的$jobs
提供了正确的数组,则可以通过
$entries
数组
array_push($entries,$jobs);