Question

因此，当用户将链接粘贴到Facebook状态时，会触发调用以获取该页面的详细信息。

我想知道的是，是否有人有任何类似的功能可以拆分页面？

考虑到这一点，得到它只是匹配一些正则表达式。

然后它通常会得到一个图像数组，也可以很容易地使用正则表达式，并且可能过滤图像太小。

我不知道它是如何弄清楚哪些文字是相关的，有什么想法？

Answer 1

或许查看像Goose这样的文章提取器可能有帮助吗？

Answer 2

正则表达式对于解析html是不好的，因为它的层次结构。你会想要使用DOMDocument类。

http://www.php.net/manual/en/class.domdocument.php

这会将页面源转换为XML对象。您应该能够非常轻松地找出如何使用XPath查询获取相关详细信息。

你可能还想看一下php函数get_meta_tags（）。

http://www.php.net/manual/en/function.get-meta-tags.php

Answer 3

值得一提的是，自引入Open Graph支持以来，Facebook在解析（抓取）使用该协议的页面时节省了大量时间和服务器负载。

查看PHP implementation了解更多信息，以下是使用其中一个库（OpenGraphNode in PHP）的小例子：

include "OpenGraphNode.php";

# Fetch and parse a URL
#
$page = "http://www.rottentomatoes.com/m/oceans_eleven/";
$node = new OpenGraphNode($page);

# Retrieve the title
#
print $node->title . "\n";    # like this
print $node->title() . "\n";  # or with parentheses

# And obviously the above works for other Open Graph Protocol
# properties like "image", "description", etc. For properties
# that contain a hyphen, you'll need to use underscore instead:
#
print $node->street_address . "\n";

# OpenGraphNode uses PHP5's Iterator feature, so you can
# loop through it like an array.
#
foreach ($node as $key => $value) {
    print "$key => $value\n";
}

Facebook链接如何拆除页面？

3 个答案: