Question

我无法在Facebook的API响应中抓取链接。当我抓取其他网页时，一切正常。我使用Nutch 2.2.1，Hbase 0.9用于存储，Solr用于索引。作为种子我正在使用

https://graph.facebook.com/v2.10/me?fields=friends%7Bfeed%7Bpermalink_url%7D%2Cname%7D&access_token=<MY_ACC_TOKEN>

注意它确定。在爬行周期结束时，我将种子保存在我的数据库中。但在提取过程中， nutch没有看到任何网址

Fetcher: throughput threshold: -1
-finishing thread FetcherThread49, activeThreads=0
Fetcher: throughput threshold sequence: 5
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues

我刚刚尝试编辑任何丢弃包含 charatchers的URL作为可能查询的文件，但没有任何反应。我已经实施了https ，默认情况下无效。

我该如何解决这个问题？

Answer 1

Facebook上不允许自动抓取。

未经Facebook明确书面许可，您不得参与自动数据收集。

在此处查看完整的ToS：

https://www.facebook.com/apps/site_scraping_tos_terms.php

Nutch没有获取graph.facebook响应

1 个答案: