我正在使用luigi执行一系列任务,如下所示:
None
当我像这样开始整个工作流程时,这完全符合要求:
import tensorflow as tf
image = tf.random_uniform((900, 600, 4)) # image tensor, acquired anyhow e.g. from tf.data
cropped_size_for_this_run = [512, 512] # crop dimensions, acquired anyhow
cropped_size = tf.placeholder_with_default(cropped_size_for_this_run, shape=[2])
cropped_image = tf.random_crop(image, size=[cropped_size[0], cropped_size[1], 4])
print(cropped_image.get_shape().as_list())
# [None, None, 4]
with tf.Session() as sess:
# You can leave cropped_size with its default value assigned at runtime:
res = sess.run(cropped_image)
print(res.shape)
# (512, 512, 4)
# ... or you can specify a new one if you wish so:
res = sess.run(cropped_image, feed_dict={cropped_size: [256, 256]})
print(res.shape)
# (256, 256, 4)
# ... It would switch back to the default value if you don't feed one:
res = sess.run(cropped_image)
print(res.shape)
# (512, 512, 4)
使用class Task1(luigi.Task):
stuff = luigi.Parameter()
def output(self):
return luigi.LocalTarget('test.json')
def run(self):
with self.output().open('w') as f:
f.write(stuff)
class Task2(luigi.Task):
stuff = luigi.Parameter()
def requires(self):
return Task1(stuff=self.stuff)
def output(self):
return luigi.LocalTarget('something-else.json')
def run(self):
with self.output().open('w') as f:
f.write(stuff)
时,您还可以通过显式传递参数as per this example in the documentation来运行多个任务。
但是,在我的情况下,我还希望能够完全独立于luigi.build([Task2(stuff='stuff')])
的业务逻辑而独立于工作流程。这适用于未实现luigi.build
,as per this example的任务。
我的问题是,如何将此方法既作为工作流程的一部分,又作为其自身的一部分运行?显然,我可以添加一个新的私有方法,如Task2
,它获取数据并返回结果,然后在requires
中使用此方法,但它只是感觉应该被烘焙到框架,所以这让我觉得我误解了Luigi的最佳实践(仍在学习框架)。感谢任何建议,谢谢!
答案 0 :(得分:1)
听起来像是您想要的dynamic requirements.使用该示例中显示的模式,您可以读取配置或传递带有任意数据的参数,并且yield
仅基于您要执行的任务配置中的字段。
# tasks.py
import luigi
import json
import time
class Parameterizer(luigi.Task):
params = luigi.Parameter() # Arbitrary JSON
def output(self):
return luigi.LocalTarget('./config.json')
def run(self):
with self.output().open('w') as f:
json.dump(params, f)
class Task1(luigi.Task):
stuff = luigi.Parameter()
def output(self):
return luigi.LocalTarget('{}'.format(self.stuff[:6]))
def run(self):
with self.output().open('w') as f:
f.write(self.stuff)
class Task2(luigi.Task):
stuff = luigi.Parameter()
params = luigi.Parameter()
def output(self):
return luigi.LocalTarget('{}'.format(self.stuff[6:]))
def run(self):
config = Parameterizer(params=self.params)
yield config
with config.output().open() as f:
parameters = json.load(f)
if parameters["runTask1"]:
yield Task1(stuff=self.stuff)
else:
pass
with self.output().open('w') as f:
f.write(self.stuff)
if __name__ == '__main__':
cf_json = '{"runTask1": True}'
print("Trying to run with Task1...")
luigi.build([Task2(stuff="Task 1Task 2", params='{"runTask1":true}')], local_scheduler=True)
time.sleep(10)
cf_json = '{"runTask1": False}'
print("Trying to run WITHOUT Task1...")
luigi.build([Task2(stuff="Task 1Did just task 2", params='{"runTask1":false}')], local_scheduler=True)
(只需调用python tasks.py
即可执行
我们可以轻松想象将多个参数映射到多个任务,或者在允许执行各种任务之前应用自定义测试。我们也可以将其重写为luigi.Config
中的参数。
还要注意来自Task2
的以下控制流:
if parameters["runTask1"]:
yield Task1(stuff=self.stuff)
else:
pass
在这里,我们可以运行一个替代任务,或动态调用任务,如在示例中从luigi
回购中看到的那样。例如:
if parameters["runTask1"]:
yield Task1(stuff=self.stuff)
else:
# self.stuff is not automatically parsed to int, so this list comp is valid
data_dependent_deps = [Task1(stuff=x) for x in self.stuff]
yield data_dependent_deps
这可能比简单的run_standalone()
方法要复杂得多,但是我认为这与您在记录的luigi模式中寻找的内容最接近。
来源:https://luigi.readthedocs.io/en/stable/tasks.html?highlight=dynamic#dynamic-dependencies