Question

我确信大家都知道当你在Facebook上输入网址时或者在发表评论时，它会自动从文章中检索图像以及我认为的标题和元描述。

我真的很想将这样的功能实现到我正在构建的网站中。唯一的问题是，我不知道从哪里开始！

理想情况下，我希望在网站上有一个专用页面，用于链接到其他感兴趣的文章。我只想显示图像，标题和几行描述性文字。标题将直接链接到源。

有没有人有任何建议或指示可以帮助我？完全感谢你们的任何提示。

非常感谢

-J

Answer 1

这可能有所帮助：http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

本教程使用PHP Simple HTML DOM Parser解析文件或网址中的html内容。

Answer 2

我不得不做一些类似的事情，我使用Jquery（以及php作为代理）来实现这一目标。

<script type="text/javascript">
$(document).ready(function()
{
$("#statusbox").keyup(function()
{
var content=$(this).val();
var urlRegex = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
var url= content.match(urlRegex);
if(url.length>0)
{
$("#statusbox").slideDown('show');
$("#statusbox").html("<img src='ajax_loader.gif'>");
// php proxy to get details of the page (bypass cross domain thing)
$.get("proxy.php?url="+url,function(response)
{
var title=(/<title>(.*?)<\/title>/m).exec(response)[1];
var logo=(/src='(.*?).jpg'/m).exec(response)[1];
$("#statusbox").html("<img src='"+logo+".jpg' class='img'/><div><b>"+title+"</b><br/>"+url)
});

}
return false;
});
});

当然可以改善.. 并且php文件可以像

一样简单

<?php
if($_GET['url'])
{
$url=$_GET['url'];
echo file_get_contents($url);
}
?>

其他更好的方法是使用Curl并使用更好的HTML解析器使用php本身检索网页的内容..

另一种解决方案（免费+付费）是使用Embedly

修改： Btw Embedly有一个worpress plugin ..

将web scraper添加到wordpress网站，类似于facebook功能

2 个答案: