Luigi-在任务之间传递变量作为输出

时间:2019-05-09 17:48:52

标签: python var luigi

我知道我可以在任务之间传递var,方法是将它们保存到文件中,然后在下一个任务中读取它。但是我不想在我的项目中生成太多文档,因此我试图将其直接作为变量传递。

我的代码如下:

class FilterSpam(luigi.Task):
    time_slice = luigi.parameter.DateMinuteParameter(interval=30, default=datetime.datetime.today())

    filtered_tweets = []

    def requires(self):
        return Streaming(time_slice=self.time_slice)

    def run(self):
        with self.input().open('r') as infile:
            reader = csv.DictReader(infile, delimiter='\t')
            tweets = list(reader)
            self.filtered_tweets, spam = filter_spam(tweets, 0.7)

        with open('data/results/detected_spam.txt', 'a') as spam_file:
            for tweet in spam:    
                spam_json = json.dumps(tweet, ensure_ascii=False)
                spam_file.write(spam_json+'\n')

    def output(self):
        return self.filtered_tweets

class LemmatizeTweets(luigi.Task):
    time_slice = luigi.parameter.DateMinuteParameter(interval=30, default=datetime.datetime.today())

    def requires(self):
        return FilterSpam(time_slice=self.time_slice)

    def run():
        filtered_tweets = self.input() # Lista de diccionarios

        lemmatized_tweets = lemmatize(filtered_tweets)

        with self.output().open('w') as outfile:
            for tweet in lemmatized_tweets:
                tweet_json = json.dumps(tweet, ensure_ascii=False)
                outfile.write(tweet_json+'\n')

    def output(self):
        return luigi.LocalTarget('data/processed/{}.csv'.format(self.time_slice))

其中filtered_tweets是字典列表。

是否可以将此var传递给任务LemmatizeTweets,而不必将其保存到文件中?如果没有,保存值的最佳方法是什么?在泡菜中,每行都有一个json对象的.txt ...?

0 个答案:

没有答案