luigi上游任务包装器失败,而其他任务完全处理

时间:2019-04-28 00:03:52

标签: python pipeline directed-acyclic-graphs luigi

我的代码:

import luigi
import pickle
from datetime import datetime


class QueryTwitterTrend(luigi.ExternalTask):

    date = luigi.DateMinuteParameter(default=datetime.now())
    country_code = luigi.Parameter(default='usa')

    def requires(self):
        return []

    def output(self, **kwargs):
        kwargs.setdefault('loc', 'dummy')

        return luigi.LocalTarget(
            "data/trends/trends_{}_{}.csv".format(self.date.strftime('%m%d_%Y_%H%M'), kwargs['loc']))

    def run(self):
        from retrieve_trends import run as retrieve_trends
        import pandas as pd

        args_dict = {
            'location': [self.country_code]
        }

        df_container = retrieve_trends(args_dict)
        f = self.output(loc=self.country_code).open('w')
        df_container[self.country_code].to_csv(f, sep=',', encoding='utf-8')
        f.close()


class TrendsTaskWrapper(luigi.WrapperTask):

    def requires(self):
        locations = [
            'usa-nyc',
            'usa-lax',
            'usa-chi',
            'usa-dal',
            'usa-hou',
            'usa-wdc',
            'usa-mia',
            'usa-phi',
            'usa-atl',
            'usa-bos',
            'usa-phx',
            'usa-sfo',
            'usa-det',
            'usa-sea',
        ]

        for loc in locations:
            yield QueryTwitterTrend(country_code=loc)

Luigi执行摘要:

===== Luigi Execution Summary =====

Scheduled 15 tasks of which:
* 14 ran successfully:
    - 14 QueryTwitterTrend(date=2019-04-27T1955, country_code=usa-atl) ...
* 1 failed:
    - 1 TrendsTaskWrapper()

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

跟踪:

RuntimeError: Unfulfilled dependencies at run time:QueryTwitterTrend_usa_nyc_2019_04_27T1955_c4a2592db0, QueryTwitterTrend_usa_lax_2019_04_27T1955_e2676d1bef..

luigi daemon中,我得到一个UPSTREAM_ERROR

enter image description here

我不知道是什么导致了此错误。就创建本地文件而言,我所有的QueryTwitterTrend任务都可以正常工作,并说它们在Luigi环境中已完成。问题是,TaskWrapper被读取为错误。

我该怎么做才能避免这种情况发生,并确保仍在跟踪所有依赖项。我很快将有更多依赖QueryTwitterTrend的任务。

0 个答案:

没有答案