我在哪里可以找到Crawler或Spider IP地址的完整列表。我需要来自谷歌雅虎微软和其他定期抓取我的网站的搜索引擎的IP。
我不想禁用它们,所以请将robots.txt文件保留在答案之外。该列表用于对每个页面上的活动进行统计报告的过滤器。
请将链接发布到可以使用的好资源。付费或免费。
答案 0 :(得分:4)
您可能不希望通过IP地址执行此操作。大多数抓取工具在抓取您的网站时会发送唯一的用户代理字符串,而您更有可能希望使用该字符串来识别它们。我不知道在哪里可以找到一个好的列表,但
编辑:实际上我在谷歌找到的this页似乎都回答了你的问题,并且还给了用户代理(这更可能是一种更好的方法)
答案 1 :(得分:2)
您的网络服务器日志。我相信他们是自由的。
答案 2 :(得分:2)
了解网络搜索引擎的合法IP并不是一件小事。用户代理很容易被欺骗。您可以做的最好的事情是手动记录日志并观察他们的行为。 IP也可能随着时间的推移而发生变化,甚至会因某些恶意目的而被欺骗。
答案 3 :(得分:2)
<强>&GT;&GT; Updated list&lt;&lt;
截至2016年1月16日的名单
[
{
"pattern": "googlebot\\/",
"url": "http://www.google.com/bot.html"
},
{
"pattern": "Googlebot-Mobile"
},
{
"pattern": "Googlebot-Image"
},
{
"pattern": "Mediapartners-Google",
"url": "https://support.google.com/webmasters/answer/1061943?hl=en"
},
{
"pattern": "bingbot",
"url": "http://www.bing.com/bingbot.htm"
},
{
"pattern": "slurp",
"url": "http://help.yahoo.com/help/us/ysearch/slurp"
},
{
"pattern": "java"
},
{
"pattern": "wget"
},
{
"pattern": "curl"
},
{
"pattern": "Commons-HttpClient"
},
{
"pattern": "Python-urllib"
},
{
"pattern": "libwww"
},
{
"pattern": "httpunit"
},
{
"pattern": "nutch"
},
{
"pattern": "phpcrawl",
"addition_date": "2012-09/17",
"url": "http://phpcrawl.cuab.de/"
},
{
"pattern": "msnbot",
"url": "http://search.msn.com/msnbot.htm"
},
{
"pattern": "jyxobot"
},
{
"pattern": "FAST-WebCrawler"
},
{
"pattern": "FAST Enterprise Crawler"
},
{
"pattern": "biglotron"
},
{
"pattern": "teoma"
},
{
"pattern": "convera"
},
{
"pattern": "seekbot"
},
{
"pattern": "gigablast",
"instances": ["Gigabot/2.0 (http://www.gigablast.com/spider.html)", "Gigabot/2.0 (http://www.gigablast.com/spider.html)", "GigablastOpenSource/1.0"],
"url": "https://github.com/gigablast/open-source-search-engine"
},
{
"pattern": "exabot"
},
{
"pattern": "ngbot"
},
{
"pattern": "ia_archiver"
},
{
"pattern": "GingerCrawler"
},
{
"pattern": "webmon "
},
{
"pattern": "httrack"
},
{
"pattern": "webcrawler"
},
{
"pattern": "grub.org"
},
{
"pattern": "UsineNouvelleCrawler"
},
{
"pattern": "antibot"
},
{
"pattern": "netresearchserver"
},
{
"pattern": "speedy"
},
{
"pattern": "fluffy"
},
{
"pattern": "bibnum.bnf"
},
{
"pattern": "findlink"
},
{
"pattern": "msrbot"
},
{
"pattern": "panscient"
},
{
"pattern": "yacybot"
},
{
"pattern": "AISearchBot"
},
{
"pattern": "IOI"
},
{
"pattern": "ips-agent"
},
{
"pattern": "tagoobot"
},
{
"pattern": "MJ12bot"
},
{
"pattern": "dotbot"
},
{
"pattern": "woriobot"
},
{
"pattern": "yanga"
},
{
"pattern": "buzzbot"
},
{
"pattern": "mlbot"
},
{
"pattern": "yandexbot",
"url": "http://yandex.com/bots",
"instances": ["Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"],
"addition_date": "2015/04/14"
},
{
"pattern": "purebot",
"addition_date": "2010/01/19"
},
{
"pattern": "Linguee Bot",
"addition_date": "2010/01/26",
"url": "http://www.linguee.com/bot"
},
{
"pattern": "Voyager",
"addition_date": "2010/02/01",
"url": "http://www.kosmix.com/crawler.html"
},
{
"pattern": "CyberPatrol",
"addition_date": "2010/02/11",
"url": "http://www.cyberpatrol.com/cyberpatrolcrawler.asp"
},
{
"pattern": "voilabot",
"addition_date": "2010/05/18"
},
{
"pattern": "baiduspider",
"addition_date": "2010/07/15",
"url": "http://www.baidu.jp/spider/"
},
{
"pattern": "citeseerxbot",
"addition_date": "2010/07/17"
},
{
"pattern": "spbot",
"addition_date": "2010/07/31",
"url": "http://www.seoprofiler.com/bot"
},
{
"pattern": "twengabot",
"addition_date": "2010/08/03",
"url": "http://www.twenga.com/bot.html"
},
{
"pattern": "postrank",
"addition_date": "2010/08/03",
"url": "http://www.postrank.com"
},
{
"pattern": "turnitinbot",
"addition_date": "2010/09/26",
"url": "http://www.turnitin.com"
},
{
"pattern": "scribdbot",
"addition_date": "2010/09/28",
"url": "http://www.scribd.com"
},
{
"pattern": "page2rss",
"addition_date": "2010/10/07",
"url": "http://www.page2rss.com"
},
{
"pattern": "sitebot",
"addition_date": "2010/12/15",
"url": "http://www.sitebot.org"
},
{
"pattern": "linkdex",
"addition_date": "2011/01/06",
"url": "http://www.linkdex.com"
},
{
"pattern": "Adidxbot",
"url": "http://onlinehelp.microsoft.com/en-us/bing/hh204496.aspx"
},
{
"pattern": "blekkobot",
"url": "http://blekko.com/about/blekkobot"
},
{
"pattern": "ezooms",
"addition_date": "2011/04/27",
"url": "http://www.phpbb.com/community/viewtopic.php?f=64&t=935605&start=450#p12948289"
},
{
"pattern": "dotbot",
"addition_date": "2011/04/27"
},
{
"pattern": "Mail.RU_Bot",
"addition_date": "2011/04/27",
"instances" : [
"Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/",
"Mozilla/5.0 (compatible; Mail.RU_Bot/2.0; +http://go.mail.ru/"
]
},
{
"pattern": "discobot",
"addition_date": "2011/05/03",
"url": "http://discoveryengine.com/discobot.html"
},
{
"pattern": "heritrix",
"addition_date": "2011/06/21",
"url": "http://crawler.archive.org/"
},
{
"pattern": "findthatfile",
"addition_date": "2011/06/21",
"url": "http://www.findthatfile.com/"
},
{
"pattern": "europarchive.org",
"addition_date": "2011/06/21",
"url": ""
},
{
"pattern": "NerdByNature.Bot",
"addition_date": "2011/07/12",
"url": "http://www.nerdbynature.net/bot"
},
{
"pattern": "sistrix crawler",
"addition_date": "2011/08/02"
},
{
"pattern": "ahrefsbot",
"addition_date": "2011/08/28"
},
{
"pattern": "Aboundex",
"addition_date": "2011/09/28",
"url": "http://www.aboundex.com/crawler/"
},
{
"pattern": "domaincrawler",
"addition_date": "2011/10/21"
},
{
"pattern": "wbsearchbot",
"addition_date": "2011/12/21",
"url": "http://www.warebay.com/bot.html"
},
{
"pattern": "summify",
"addition_date": "2012/01/04",
"url": "http://summify.com"
},
{
"pattern": "ccbot",
"addition_date": "2012/02/05",
"url": "http://www.commoncrawl.org/bot.html"
},
{
"pattern": "edisterbot",
"addition_date": "2012/02/25"
},
{
"pattern": "seznambot",
"addition_date": "2012/03/14"
},
{
"pattern": "ec2linkfinder",
"addition_date": "2012/03/22"
},
{
"pattern": "gslfbot",
"addition_date": "2012/04/03"
},
{
"pattern": "aihitbot",
"addition_date": "2012/04/16"
},
{
"pattern": "intelium_bot",
"addition_date": "2012/05/07"
},
{
"pattern": "facebookexternalhit",
"addition_date": "2012/05/07"
},
{
"pattern": "yeti",
"addition_date": "2012/05/07"
},
{
"pattern": "RetrevoPageAnalyzer",
"addition_date": "2012/05/07"
},
{
"pattern": "lb-spider",
"addition_date": "2012/05/07"
},
{
"pattern": "sogou",
"addition_date": "2012/05/13",
"url": "http://www.sogou.com/docs/help/webmasters.htm#07"
},
{
"pattern": "lssbot",
"addition_date": "2012/05/15"
},
{
"pattern": "careerbot",
"addition_date": "2012/05/23",
"url": "http://www.career-x.de/bot.html"
},
{
"pattern": "wotbox",
"addition_date": "2012/06/12",
"url": "http://www.wotbox.com"
},
{
"pattern": "wocbot",
"addition_date": "2012/07/25",
"url": "http://www.wocodi.com/crawler"
},
{
"pattern": "ichiro",
"addition_date": "2012/08/28",
"url": "http://help.goo.ne.jp/help/article/1142"
},
{
"pattern": "DuckDuckBot",
"addition_date": "2012/09/19",
"url": "http://duckduckgo.com/duckduckbot.html"
},
{
"pattern": "lssrocketcrawler",
"addition_date": "2012/09/24"
},
{
"pattern": "drupact",
"addition_date": "2012/09/27",
"url": "http://www.arocom.de/drupact"
},
{
"pattern": "webcompanycrawler",
"addition_date": "2012/10/03"
},
{
"pattern": "acoonbot",
"addition_date": "2012/10/07",
"url": "http://www.acoon.de/robot.asp"
},
{
"pattern": "openindexspider",
"addition_date": "2012/10/26",
"url": "http://www.openindex.io/en/webmasters/spider.html"
},
{
"pattern": "gnam gnam spider",
"addition_date": "2012/10/31"
},
{
"pattern": "web-archive-net.com.bot"
},
{
"pattern": "backlinkcrawler",
"addition_date": "2013/01/04"
},
{
"pattern": "coccoc",
"addition_date": "2013/01/04",
"url": "http://help.coccoc.vn/"
},
{
"pattern": "integromedb",
"addition_date": "2013/01/10",
"url": "http://www.integromedb.org/Crawler"
},
{
"pattern": "content crawler spider",
"addition_date": "2013/01/11"
},
{
"pattern": "toplistbot",
"addition_date": "2013/02/05"
},
{
"pattern": "seokicks-robot",
"addition_date": "2013/02/25"
},
{
"pattern": "it2media-domain-crawler",
"addition_date": "2013/03/12"
},
{
"pattern": "ip-web-crawler.com",
"addition_date": "2013/03/22"
},
{
"pattern": "siteexplorer.info",
"addition_date": "2013/05/01"
},
{
"pattern": "elisabot",
"addition_date": "2013/06/27"
},
{
"pattern": "proximic",
"addition_date": "2013/09/12",
"url": "http://www.proximic.com/info/spider.php"
},
{
"pattern": "changedetection",
"addition_date": "2013/09/13",
"url": "http://www.changedetection.com/bot.html"
},
{
"pattern": "blexbot",
"addition_date": "2013/10/03",
"url": "http://webmeup-crawler.com/"
},
{
"pattern": "arabot",
"addition_date": "2013/10/09"
},
{
"pattern": "WeSEE:Search",
"addition_date": "2013/11/18"
},
{
"pattern": "niki-bot",
"addition_date": "2014/01/01"
},
{
"pattern": "CrystalSemanticsBot",
"addition_date": "2014/02/17",
"url": "http://www.crystalsemantics.com/user-agent/"
},
{
"pattern": "rogerbot",
"addition_date": "2014/02/28",
"url": "http://moz.com/help/pro/what-is-rogerbot-"
},
{
"pattern": "360Spider",
"addition_date": "2014/03/14",
"url": "http://needs-be.blogspot.co.uk/2013/02/how-to-block-spider360.html"
},
{
"pattern": "psbot",
"addition_date": "2014/03/31",
"url": "http://www.picsearch.com/bot.html"
},
{
"pattern": "InterfaxScanBot",
"addition_date": "2014/03/31",
"url": "http://scan-interfax.ru"
},
{
"pattern": "Lipperhey SEO Service",
"addition_date": "2014/04/01",
"url": "http://www.lipperhey.com/"
},
{
"pattern": "CC Metadata Scaper",
"addition_date": "2014/04/01",
"url": "http://wiki.creativecommons.org/Metadata_Scraper"
},
{
"pattern": "g00g1e.net",
"addition_date": "2014/04/01",
"url": "http://www.g00g1e.net/"
},
{
"pattern": "GrapeshotCrawler",
"addition_date": "2014/04/01",
"url": "http://www.grapeshot.co.uk/crawler.php"
},
{
"pattern": "urlappendbot",
"addition_date": "2014/05/10",
"url": "http://www.profound.net/urlappendbot.html"
},
{
"pattern": "brainobot",
"addition_date": "2014/06/24"
},
{
"pattern": "fr-crawler",
"addition_date": "2014/07/31",
"instances": ["Mozilla/5.0 (compatible; fr-crawler/1.1)"]
},
{
"pattern": "binlar",
"addition_date": "2014/09/12",
"instances": [
"binlar_2.6.3 binlar2.6.3@unspecified.mail",
"binlar_2.6.3 binlar_2.6.3@unspecified.mail",
"binlar_2.6.3 larbin2.6.3@unspecified.mail",
"binlar_2.6.3 phanendra_kalapala@McAfee.com",
"binlar_2.6.3 test@mgmt.mic"
]
},
{
"pattern": "SimpleCrawler",
"addition_date": "2014/09/12",
"instances": ["SimpleCrawler/0.1" ]
},
{
"pattern": "Livelapbot",
"addition_date": "2014/09/12",
"instances": ["Livelapbot/0.1" ]
},
{
"pattern": "Twitterbot",
"addition_date": "2014/09/12",
"instances": ["Twitterbot/0.1", "Twitterbot/1.0" ]
},
{
"pattern": "cXensebot",
"addition_date": "2014/10/05",
"instances": ["cXensebot/1.1a"],
"url": "http://www.cxense.com/bot.html"
},
{
"pattern": "smtbot",
"addition_date": "2014/10/04",
"instances": ["Mozilla/5.0 (compatible; SMTBot/1.0; +http://www.similartech.com/smtbo)t", "SMTBot (similartech.com/smtbot)"],
"url": "http://www.similartech.com/smtbot"
},
{
"pattern": "bnf.fr_bot",
"addition_date": "2014/11/18",
"url": "http://www.bnf.fr/fr/outils/a.dl_web_capture_robot.html",
"instances": ["Mozilla/5.0 (compatible; bnf.fr_bot; +http://www.bnf.fr/fr/outils/a.dl_web_capture_robot.html)"]
},
{
"pattern": "A6-Indexer",
"addition_date": "2014/12/05",
"url": "http://www.a6corp.com/a6-web-scraping-policy/",
"instances": ["A6-Indexer"]
},
{
"pattern": "ADmantX",
"addition_date": "2014/12/05",
"url": "http://www.admantx.com",
"instances": ["ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com"]
},
{
"pattern": "Facebot",
"url": "https://developers.facebook.com/docs/sharing/best-practices#crawl",
"addition_date": "2014/12/30"
},
{
"pattern": "Twitterbot",
"url": "https://dev.twitter.com/cards/getting-started",
"addition_date": "2014/12/30"
},
{
"pattern": "OrangeBot",
"instances": ["Mozilla/5.0 (compatible; OrangeBot/2.0; support.orangebot@orange.com"],
"addition_date": "2015/01/12"
},
{
"pattern": "memorybot",
"url": "http://mignify.com/bot.htm",
"instances": ["Mozilla/5.0 (compatible; memorybot/1.21.14 +http://mignify.com/bot.html)"],
"addition_date": "2015/02/01"
},
{
"pattern": "AdvBot",
"url": "http://advbot.net/bot.html",
"instances": ["Mozilla/5.0 (compatible; AdvBot/2.0; +http://advbot.net/bot.html)"],
"addition_date": "2015/02/01"
},
{
"pattern": "MegaIndex",
"url": "https://www.megaindex.ru/?tab=linkAnalyze",
"instances": ["Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)"],
"addition_date": "2015/03/28"
},
{
"pattern": "SemanticScholarBot",
"url": "http://s2.allenai.org/bot.html",
"instances": ["SemanticScholarBot/1.0 (+http://s2.allenai.org/bot.html)"],
"addition_date": "2015/03/28"
},
{
"pattern": "ltx71",
"url": "http://ltx71.com/",
"instances": ["ltx71 - (http://ltx71.com/)"],
"addition_date": "2015/04/04"
},
{
"pattern": "nerdybot",
"url": "http://nerdybot.com/",
"instances": ["nerdybot"],
"addition_date": "2015/04/05"
},
{
"pattern": "xovibot",
"url": "http://www.xovibot.net/",
"instances": ["Mozilla/5.0 (compatible; XoviBot/2.0; +http://www.xovibot.net/)"],
"addition_date": "2015/04/05"
},
{
"pattern": "BUbiNG",
"url": "http://law.di.unimi.it/BUbiNG.html",
"instances": ["BUbiNG (+http://law.di.unimi.it/BUbiNG.html)"],
"addition_date": "2015/04/06"
},
{
"pattern": "Qwantify",
"url": "https://www.qwant.com/",
"instances": ["Mozilla/5.0 (compatible; Qwantify/2.0n; +https://www.qwant.com/)/*"],
"addition_date": "2015/04/06"
},
{
"pattern": "archive.org_bot",
"url": "http://www.archive.org/details/archive.org_bot",
"instances": ["Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)"],
"addition_date": "2015/04/14"
},
{
"pattern": "Applebot",
"url": "http://www.apple.com/go/applebot",
"addition_date": "2015/04/15"
},
{
"pattern": "TweetmemeBot",
"url": "http://datasift.com/bot.html",
"instances": ["Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0"],
"addition_date": "2015/04/15"
},
{
"pattern": "crawler4j",
"url": "https://github.com/yasserg/crawler4j",
"instances": ["crawler4j (http://code.google.com/p/crawler4j/)"],
"addition_date": "2015/05/07"
},
{
"pattern": "findxbot",
"url": "http://www.findxbot.com",
"instances": ["Mozilla/5.0 (compatible; Findxbot/1.0; +http://www.findxbot.com)"],
"addition_date": "2015/05/07"
},
{
"pattern": "SemrushBot",
"url": "http://www.semrush.com/bot.html",
"instances": ["Mozilla/5.0 (compatible; SemrushBot/0.98~bl; +http://www.semrush.com/bot.html)"],
"addition_date": "2015/05/26"
},
{
"pattern": "yoozBot",
"url": "http://yooz.ir",
"instances": ["Mozilla/5.0 (compatible; yoozBot-2.2; http://yooz.ir; info@yooz.ir)"],
"addition_date": "2015/05/26"
},
{
"pattern": "lipperhey",
"url": "http://www.lipperhey.com/",
"instances": ["Mozilla/5.0 (compatible; Lipperhey Link Explorer; http://www.lipperhey.com/)", "Mozilla/5.0 (compatible; Lipperhey SEO Service; http://www.lipperhey.com/)", "Mozilla/5.0 (compatible; Lipperhey Site Explorer; http://www.lipperhey.com/)", "Mozilla/5.0 (compatible; Lipperhey-Kaus-Australis/5.0; +https://www.lipperhey.com/en/about/)"],
"addition_date": "2015/08/26"
},
{
"pattern": "y!j-asr",
"url": "http://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/",
"instances": ["Y!J-ASR/0.1 crawler (http://www.yahoo-help.jp/app/answers/detail/p/595/a_id/42716/)"],
"addition_date": "2015/05/26"
},
{
"pattern": "Domain Re-Animator Bot",
"url": "http://domainreanimator.com",
"instances": ["Domain Re-Animator Bot (http://domainreanimator.com) - support@domainreanimator.com"],
"addition_date": "2015/04/14"
},
{
"pattern": "AddThis",
"url": "https://www.addthis.com",
"instances": ["AddThis.com robot tech.support@clearspring.com"],
"addition_date": "2015/06/02"
},
{
"pattern": "Screaming Frog SEO Spider",
"url": "http://www.screamingfrog.co.uk/seo-spider",
"instances": ["Screaming Frog SEO Spider/5.1"],
"addition_date": "2016/01/08"
},
{
"pattern": "MetaURI",
"url": "http://www.useragentstring.com/MetaURI_id_17683.php",
"instances": ["MetaURI API/2.0 +metauri.com"],
"addition_date": "2016/01/02"
},
{
"pattern": "Scrapy",
"url": "http://scrapy.org/",
"instances": ["Scrapy/1.0.3 (+http://scrapy.org)"],
"addition_date": "2016/01/02"
},
{
"pattern": "LivelapBot",
"url": "http://site.livelap.com/crawler",
"instances": ["LivelapBot/0.2 (http://site.livelap.com/crawler)"],
"addition_date": "2016/01/02"
},
{
"pattern": "OpenHoseBot",
"url": "http://www.openhose.org/bot.html",
"instances": ["Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)"],
"addition_date": "2016/01/02"
},
{
"pattern": "CapsuleChecker",
"url": "http://www.capsulink.com/about",
"instances": ["CapsuleChecker (http://www.capsulink.com/)"],
"addition_date": "2016/01/02"
},
{
"pattern": "collection@infegy.com",
"url": "http://infegy.com/",
"instances": ["Mozilla/5.0 (compatible) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 Safari/537.36 collection@infegy.com"],
"addition_date": "2016/01/03"
},
{
"pattern": "IstellaBot",
"url": "http://www.tiscali.it/",
"instances": ["Mozilla/5.0 (compatible; IstellaBot/1.23.15 +http://www.tiscali.it/)"],
"addition_date": "2016/01/09"
}
]
答案 4 :(得分:1)
发现Shodan.IO bot ip地址 198.20.69.72 - 198.20.69.79 198.20.69.96 - 198.20.69.103
我确定可能还有其他地址在使用,但这些是我通过一点挖掘找到的......