所以我试图从网站上获取最新消息并将其自己包含在内。 该站点使用Joomla(ugh),结果内容href缺少基本href。 所以链接将持有contensite.php?blablabla。这将导致链接http://www.mysite.com/contensite.php?blablabla
所以我想在回复之前用'http://www.basehref.com'替换'http://'。但我的知识停在这里。 我应该用吗? preg_replace,str_replace?我不确定。
答案 0 :(得分:0)
所以我不能(因为我缺乏preg匹配的知识)修复损坏的链接,而是用其他链接替换它们,并将链接的类替换为我的fancybox类,这样它将打开源代码网站在fancybox。
include_once('db_connect.php');
// connect to my db
include_once('dom.php');
// include html_simple_dom!
$dom = file_get_html('http://www.remotesite.com');
// get the html content of a site and pass it through html simple dom !
$elem = $dom->find('div[class=blog]', 0);
// set the div to target for !
$pattern = '/(?<=href\=")[^]]+?(?=")/';
$replacement ='http://www.remotesite.com';
$replacedHrefHtml = preg_replace($pattern, $replacement, $elem);
// replacement 1
// replace the broken links (base href is missing , joomla sucks , period !)
// im to lazy to preg_match it any other way, feel free to improve this !
$pattern2 = '/contentpagetitle/';
$replacement2 ='fancybox fancybox.iframe';
$replacedHrefHtml2 = preg_replace($pattern2, $replacement2,$replacedHrefHtml );
// replacement 2
// replace the joomla class on the links with the class contentpagetitle to my fancybox class ! fancy innit!
$pattern2 = '/readon/';
$replacement2 ='fancybox fancybox.iframe';
$replacedHrefHtml2 = preg_replace($pattern2, $replacement2,$replacedHrefHtml );
// replacement 2
// replace the joomla class on the links with class readon to my fancybox class ! fancy innit!
$replacedHrefHtml3 = preg_replace("/<img[^>]+\>/i", "<br />(Plaatje)<br /><br /> ", $replacedHrefHtml2);
// finally remove the images from the string !
$replacedHrefHtml4 = base64_encode($replacedHrefHtml3);
// encode the html with base64 before store to mysel
// real escape wont work since it will break the links !
try {
$conn = new PDO($link, $pdo_username, $pdo_password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$data222 = $conn->query('SELECT * FROM svvnieuws ORDER BY id DESC LIMIT 1');
foreach($data222 as $row) {
$lastitem = sprintf($row[inhoud]);
}
} catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}
// get the last stored item in db for comparisation to current result!
if ($replacedHrefHtml4 == $lastitem){
// if the last item from the db is the same, do not store a new item ! importand to prevent clutter !
}
else {
// if its not the same, store a new item !
$conn = new PDO($link, $pdo_username, $pdo_password);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
// set up the connection to the db
$sql='INSERT INTO svvnieuws (id,inhoud) VALUES ("","'.$replacedHrefHtml4.'")';
// set the mysql query string
$rip = $conn->prepare($sql);
$rip->execute(array(':id'=>$id,
':inhoud'=>$replacedHrefHtml4
));
// insert to the db !
}
// close the else !
// place this file outside of the docroot, and let the cron run it every say 4 hours.
// ofcourse make sure you also place dom.php in the same directory!
// dom.php is my short name for php simple html dom.
所以替换1取代了
&LT; a href =&#34;无论如何&#34;&gt;到&lt; a href =&#34; www.remotesite.com&#34;&gt;
replace 2将该href上的类替换为fancybox
replace 3将readon链接上的类替换为fancybox
与上次存储的项目进行比较
如果存储不同的话。
我很想知道,如何修复损坏的链接而不是替换它们。 来自该站点的链接源自如下:&lt; a href =&#34; /index.php?blabla&#34;&gt; 如果有可能我能够将www.mysite.com注入&lt; a href =&#34; /index.php?blabla&#34;&gt;制作它&lt; a href =&#34; www.remotesite.com/index.php?blabla&#34;&gt;
答案 1 :(得分:0)
include_once('db_connect.php');
// connect to my db
require_once('Net/URL2.php');
include_once('dom.php');
// include html_simple_dom!
$dom = file_get_html('http://www.targetsite.com');
// get the html content of a site and pass it through html simple dom !
$elem2 = $dom->find('div[class=blog]', 0);
// set the div to target for !
$uri = new Net_URL2('http://www.svvenray.nl'); // URI of the resource
$baseURI = $uri;
foreach ($elem2->find('base[href]') as $elem) {
$baseURI = $uri->resolve($elem->href);
}
foreach ($elem2->find('*[src]') as $elem) {
$elem->src = $baseURI->resolve($elem->src)->__toString();
}
foreach ($elem2->find('*[href]') as $elem) {
if (strtoupper($elem->tag) === 'BASE') continue;
$elem->href = $baseURI->resolve($elem->href)->__toString();
}
echo $elem2;
这将修复所有损坏的链接,并需要PHP PEAR Net / URL2.php