我知道我可以在任务之间传递var,方法是将它们保存到文件中,然后在下一个任务中读取它。但是我不想在我的项目中生成太多文档,因此我试图将其直接作为变量传递。
我的代码如下:
class FilterSpam(luigi.Task):
time_slice = luigi.parameter.DateMinuteParameter(interval=30, default=datetime.datetime.today())
filtered_tweets = []
def requires(self):
return Streaming(time_slice=self.time_slice)
def run(self):
with self.input().open('r') as infile:
reader = csv.DictReader(infile, delimiter='\t')
tweets = list(reader)
self.filtered_tweets, spam = filter_spam(tweets, 0.7)
with open('data/results/detected_spam.txt', 'a') as spam_file:
for tweet in spam:
spam_json = json.dumps(tweet, ensure_ascii=False)
spam_file.write(spam_json+'\n')
def output(self):
return self.filtered_tweets
class LemmatizeTweets(luigi.Task):
time_slice = luigi.parameter.DateMinuteParameter(interval=30, default=datetime.datetime.today())
def requires(self):
return FilterSpam(time_slice=self.time_slice)
def run():
filtered_tweets = self.input() # Lista de diccionarios
lemmatized_tweets = lemmatize(filtered_tweets)
with self.output().open('w') as outfile:
for tweet in lemmatized_tweets:
tweet_json = json.dumps(tweet, ensure_ascii=False)
outfile.write(tweet_json+'\n')
def output(self):
return luigi.LocalTarget('data/processed/{}.csv'.format(self.time_slice))
其中filtered_tweets
是字典列表。
是否可以将此var传递给任务LemmatizeTweets
,而不必将其保存到文件中?如果没有,保存值的最佳方法是什么?在泡菜中,每行都有一个json对象的.txt ...?