如何检查URL是否存在而不将其拉下来?我使用以下代码,但它下载整个文件。我只需要检查它是否存在。
app.get('/api/v1/urlCheck/', function (req,res) {
var url=req.query['url'];
var request = require('request');
request.get(url, {timeout: 30000, json:false}, function (error, result) {
res.send(result.body);
});
});
感谢任何帮助!
答案 0 :(得分:28)
试试这个:
var http = require('http'),
options = {method: 'HEAD', host: 'stackoverflow.com', port: 80, path: '/'},
req = http.request(options, function(r) {
console.log(JSON.stringify(r.headers));
});
req.end();
答案 1 :(得分:13)
谢谢!在这里,它封装在一个函数中(在5/30/17更新, require 在外面):
var http = require('http'),
url = require('url');
exports.checkUrlExists = function (Url, callback) {
var options = {
method: 'HEAD',
host: url.parse(Url).host,
port: 80,
path: url.parse(Url).pathname
};
var req = http.request(options, function (r) {
callback( r.statusCode== 200);});
req.end();
}
它非常快(我大约50毫秒,但这取决于你的连接和服务器速度)。请注意,它也非常基本,即它不能很好地处理重定向......
答案 2 :(得分:7)
只需使用url-exists npm包测试网址是否存在
var urlExists = require('url-exists');
urlExists('https://www.google.com', function(err, exists) {
console.log(exists); // true
});
urlExists('https://www.fakeurl.notreal', function(err, exists) {
console.log(exists); // false
});
答案 3 :(得分:3)
'require'进入函数是Node中的错误方法。 跟随es6方法支持所有正确的http状态,当然如果你得到像'fff.kkk那样糟糕的'主机',则会检索错误
checkUrlExists(host,cb) {
http.request({method:'HEAD',host,port:80,path: '/'}, (r) => {
cb(null, r.statusCode > 200 && r.statusCode < 400 );
}).on('error', cb).end();
}
答案 4 :(得分:1)
使用其他回复作为参考,这里是一个可以与https
uris(对于节点6+
)一起使用的promisified版本:
const http = require('http');
const https = require('https');
const url = require('url');
const request = (opts = {}, cb) => {
const requester = opts.protocol === 'https:' ? https : http;
return requester.request(opts, cb);
};
module.exports = target => new Promise((resolve, reject) => {
let uri;
try {
uri = url.parse(target);
} catch (err) {
reject(new Error(`Invalid url ${target}`));
}
const options = {
method: 'HEAD',
host: uri.host,
protocol: uri.protocol,
port: uri.port,
path: uri.path,
timeout: 5 * 1000,
};
const req = request(options, (res) => {
const { statusCode } = res;
if (statusCode >= 200 && statusCode < 300) {
resolve(target);
} else {
reject(new Error(`Url ${target} not found.`));
}
});
req.on('error', reject);
req.end();
});
可以像这样使用:
const urlExists = require('./url-exists')
urlExists('https://www.google.com')
.then(() => {
console.log('Google exists!');
})
.catch(() => {
console.error('Invalid url :(');
});
答案 5 :(得分:1)
查看 url-exists npm包https://www.npmjs.com/package/url-exists
设置:
$ npm install url-exists
用途:
const urlExists = require('url-exists');
urlExists('https://www.google.com', function(err, exists) {
console.log(exists); // true
});
urlExists('https://www.fakeurl.notreal', function(err, exists) {
console.log(exists); // false
});
你也可以宣传它以利用等待和 async :
const util = require('util');
const urlExists = util.promisify(require('url-exists'));
let isExists = await urlExists('https://www.google.com'); // true
isExists = await urlExists('https://www.fakeurl.notreal'); // false
快乐的编码!
答案 6 :(得分:0)
我期待的异步ES6解决方案,执行HEAD请求:
// options for the http request
let options = {
host: 'google.de',
//port: 80, optional
//path: '/' optional
}
const http = require('http');
// creating a promise (all promises a can be awaited)
let isOk = await new Promise(resolve => {
// trigger the request ('HEAD' or 'GET' - you should check if you get the expected result for a HEAD request first (curl))
// then trigger the callback
http.request({method:'HEAD', host:options.host, port:options.port, path: options.path}, result =>
resolve(result.statusCode >= 200 && result.statusCode < 400)
).on('error', resolve).end();
});
// check if the result was NOT ok
if (!isOk)
console.error('could not get: ' + options.host);
else
console.info('url exists: ' + options.host);
答案 7 :(得分:0)
如果您有权使用request
软件包,则可以尝试以下操作:
const request = require("request")
const urlExists = url => new Promise((resolve, reject) => request.head(url).on("response", res => resolve(res.statusCode.toString()[0] === "2")))
urlExists("https://google.com").then(exists => console.log(exists)) // true
答案 8 :(得分:0)
我在您的代码中看到您已经在使用request
库,因此:
const request = require('request');
request.head('http://...', (error, res) => {
const exists = !error && res.statusCode === 200;
});
答案 9 :(得分:0)
如果您使用的是axios,则可以像这样提取头部:
const checkUrl = async (url) => {
try {
await axios.head(fullUrl);
return true;
} catch (error) {
if (error.response.status >= 400) {
return false;
}
}
}
您可能想根据自己的需求自定义status code范围,例如401(未经授权)可能仍表示存在URL,但您无权访问。
答案 10 :(得分:0)
@schlicki指出,目前不推荐使用request
模块。他发布的link中的一种替代方法是got
:
const got = require('got');
(async () => {
try {
const response = await got('https://www.nodesource.com/');
console.log(response.body);
//=> '<!doctype html> ...'
} catch (error) {
console.log(error.response.body);
//=> 'Internal server error ...'
}
})();
但是使用这种方法,您将在reponse.body
中获得整个HTML页面。此外,got
可能具有许多您可能不需要的功能。那就是我想向列表中添加另一个选择。当我使用portscanner库时,无需下载网站内容即可将其用于相同目的。如果网站与https
var portscanner = require('portscanner')
// Checks the status of a single port
portscanner.checkPortStatus(80, 'www.google.es', function(error, status) {
// Status is 'open' if currently in use or 'closed' if available
console.log(status)
})
无论如何,最接近的方法是url-exist
模块,就像@Richie Bendall在他的帖子中解释的那样。我只想添加一些其他选择
答案 11 :(得分:0)
至2020年12月,我建议使用is-reachable软件包。就像魔术一样。
答案 12 :(得分:0)
似乎很多人都推荐使用一个库,但 url-exist 包含一个数据获取库的依赖项,因此这里是使用所有本机节点模块的它的克隆:
const http = require('http');
const { parse, URL } = require('url');
// https://github.com/sindresorhus/is-url-superb/blob/main/index.js
function isUrl(str) {
if (typeof str !== 'string') {
return false;
}
const trimmedStr = str.trim();
if (trimmedStr.includes(' ')) {
return false;
}
try {
new URL(str); // eslint-disable-line no-new
return true;
} catch {
return false;
}
}
// https://github.com/Richienb/url-exist/blob/master/index.js
function urlExists(url) {
return new Promise((resolve) => {
if (!isUrl(url)) {
resolve(false);
}
const options = {
method: 'HEAD',
host: parse(url).host,
path: parse(url).pathname,
port: 80,
};
const req = http.request(options, (res) => {
resolve(res.statusCode < 400 || res.statusCode >= 500);
});
req.end();
});
}
urlExists(
'https://stackoverflow.com/questions/26007187/node-js-check-if-a-remote-url-exists'
).then(console.log);
这也可能会吸引那些为了非常简单的目的而不想安装依赖项的人。
答案 13 :(得分:-1)
这里有一些非常糟糕的答案。为这么一小段代码使用第三方库非常愚蠢。尝试做一些实际的编程! danwarfel 的回答为我提供了一些方法,但它仍然不太正确:它泄漏内存,不遵循重定向,不支持 https(可能是您想要的)并且实际上并没有回答问题 - 它只是记录标题!这是我的版本:
import * as https from "https";
// Return true if the URL is found and returns 200. Returns false if there are
// network errors or the status code is not 200. It will throw an exception
// for configuration errors (e.g. malformed URLs).
//
// Note this only supports https, not http.
//
async function isUrlFound(url: string, maxRedirects = 20): Promise<boolean> {
const [statusCode, location] = await new Promise<[number?, string?]>(
(resolve, _reject) => {
const req = https.request(
url,
{
method: "HEAD",
},
response => {
// This is necessary to avoid memory leaks.
response.on("readable", () => response.read());
resolve([response.statusCode, response.headers["location"]]);
},
);
req.on("error", _err => resolve([undefined, undefined]));
req.end();
},
);
if (
statusCode !== undefined &&
statusCode >= 300 &&
statusCode < 400 &&
location !== undefined &&
maxRedirects > 0
) {
return isUrlFound(location, maxRedirects - 1);
}
return statusCode === 200;
}
经过最低限度的测试,但似乎有效。