当调用时,我具有以下lambda函数,该函数可以很好地启动databricks集群。现在,我想添加另一个lambda函数,并在间隔60秒后依次运行它。我尝试依次列出两个lambda函数,但仅执行了最后一个,并且由于群集处于TERMINATED状态,因此作业失败。集群启动后,有人可以帮我运行作业吗?
用于启动数据块集群的Lambda:
const https = require("https");
var tokenstr = "token:xxxxxxxxaaaaaabbbbbccccccc";
exports.handler = (event, context, callback) =>
{
var data = JSON.stringify({
"cluster_id": "2222-111000-123abcde"
});
var start_cluster_options = {
host: "aaa.cloud.databricks.com",
port: 443,
path: "/api/2.0/clusters/start",
method: "POST",
// authentication headers
headers: {
"Authorization": "Basic " + new Buffer(tokenstr).toString("base64"),
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(data)
}
};
var request = https.request(start_cluster_options, function(res){
var body = "";
res.on("data", function(data) {
body += data;
});
res.on("end", function() {
console.log(body);
});
res.on("error", function(e) {
console.log("Got error: " + e.message);
});
});
request.write(data);
request.end();
};
从lambda运行databricks作业的功能:
exports.handler = (event, context, callback) => {
var data = JSON.stringify({
"job_id": 11111
});
var run_job_options = {
host: "aaa.cloud.databricks.com",
port: 443,
path: "/api/2.0/jobs/run-now",
method: "POST",
// authentication headers
headers: {
"Authorization": "Basic " + new Buffer(tokenstr).toString("base64"),
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(data)
}
};
var request = https.request(run_job_options, function(res){
var body = "";
res.on("data", function(data) {
body += data;
});
我想将START / RUN_JOB都放在同一个lambda函数中,如果不是更好的方法,请帮助我,这是LAMBDA调用的新手。
更新:
我已经按照@Dudemullet的建议修改了我的代码,并收到错误消息“ 2018-08-15T22:28:14.446Z 7dfe42ff-a0da-11e8-9e71-f77e93d8a2f8任务在3.00秒后超时”,不确定,我在做什么错,请帮忙。
const https = require("https");
var tokenstr = "token:xxxxxxxxaaaaaabbbbbccccccc";
var data = JSON.stringify({
"cluster_id": "2222-111000-123abcde"
});
var data2 = JSON.stringify({
"job_id": 11111
});
var start_cluster_options = {
host: "aaa.cloud.databricks.com",
port: 443,
path: "/api/2.0/clusters/start",
method: "POST",
// authentication headers
headers: {
"Authorization": "Basic " + new Buffer(tokenstr).toString("base64"),
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(data)
}
};
var run_job_options = {
host: "aaa.cloud.databricks.com",
port: 443,
path: "/api/2.0/jobs/run-now",
method: "POST",
// authentication headers
headers: {
"Authorization": "Basic " + new Buffer(tokenstr).toString("base64"),
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(data2)
}
};
exports.handler = (event, context, callback) =>
{
https.request(start_cluster_options, function(res){});
setTimeout(() => {
https.request(run_job_options, function(res){});
callback(); // notify lambda everything is complete
}, 60);
};
我在python中执行lambda函数,但是此函数是从示例扩展而来的,因此不确定node.js编码。
******更新结束******
理想情况下,我希望将其包含在AWS lambda中,而不要使用AWS Step函数等。
谢谢
答案 0 :(得分:0)
您可以使用AWS Step Functions
进行此操作。基本上就像一个工作流程。
从总体上讲,这可能是您想要做的:
1) Run your lambda to start the cluster and return cluster id or something.
2) Check cluster status every 10 seconds.
3) If the cluster is up, execute `submit job` lambda function.
答案 1 :(得分:0)
可以说您将其抽象为两个功能。
startServer
和runJob
您的lambda将一直运行,直到您调用回调或执行时间(TTL)到期为止。因此,您可以编写如下所示的代码。
exports.handler = (event, context, callback) => {
https.request(start_cluster_options, function (res) {
setTimeout(() => {
https.request(run_job_options, function (res) {
callback();
});
}, 60);
});
};
另一种简便的方法是使用SQS。 Lambdas现在可以将SQS用作事件源。因此,您可以在SQS队列中创建消息并将其可见性超时设置为所需的任何时间。 Sqs visibility timeout