我可以通过自定义搜索API检索附加链接吗?

时间:2016-01-19 13:27:30

标签: google-custom-search google-search-api

我想要刮掉谷歌搜索结果中显示的附加链接(例如关于我们主页等)。有什么方法可以找回它们吗? enter image description here

1 个答案:

答案 0 :(得分:0)

我最近实现了Google Search JSON API,根据我的理解,获取网站链接的唯一方法是通过JSON Callback,其中每个结果都包含formattedUrl或htmlFormattedUrl。查询将是有问题的网站,希望第一个结果将为您提供网站的相关链接。

但是,如果我正确理解了您的问题,您想要废弃给定网站的子链接,这是web crawler会做的事情。如果您是网站的所有者,您可以使用网络上的许多工具创建站点地图,但如果您的意图可以归类为"其他",那么我相信您正在咆哮在错误的树上。请参阅此question,它将指出您创建一个简单的WebCrawler。

//查询 Deovandski 的示例customsearch#results item。

 "items": [
  {
   "kind": "customsearch#result",
   "title": "Student Experience - College of Science and Mathematics (NDSU)",
   "htmlTitle": "Student Experience - College of Science and Mathematics (NDSU)",
   "link": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
   "displayLink": "www.ndsu.edu",
   "snippet": "Sep 16, 2015 ... Association for Computing Machinery Student Chapter Chair: Jordan Goetze \nAdvisor: Brian Slator. Upsilon Pi Epsilon President: Deovandski ...",
   "htmlSnippet": "Sep 16, 2015 \u003cb\u003e...\u003c/b\u003e Association for Computing Machinery Student Chapter Chair: Jordan Goetze \u003cbr\u003e\nAdvisor: Brian Slator. Upsilon Pi Epsilon President: \u003cb\u003eDeovandski\u003c/b\u003e ...",
   "cacheId": "pyzF9XJwrXsJ",
   "formattedUrl": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
   "htmlFormattedUrl": "https://www.ndsu.edu/scimath/currentstudents/student_experience/",
   "pagemap": {
    "cse_image": [
     {
      "src": "https://www.ndsu.edu/fileadmin/_processed_/csm_080117_anatomy_03med_9dbc3c8cce.jpg"
     }
    ],
    "cse_thumbnail": [
     {
      "width": "184",
      "height": "275",
      "src": "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTTL-GZRfSv30cyESsCnd_65BFoLMDdo8fqNS58mHfRbGiOTjSq-e-o28FE"
     }
    ]
   }
  },