Luigi Python-需要依赖中断脚本。错误-“在运行时未满足依赖性”

时间:2019-03-19 15:27:16

标签: python luigi

我是python的新手,正在寻求将Luigi实现到我的一些python数据处理脚本中。我有两个任务,一个任务将通过网络抓取一些数据并创建一个csv。下一个任务(取决于第一个任务csv文件)将运行sql服务器proc,以将csv数据转储到数据库中。当我分别运行这些任务时,它们可以正常工作。但是当我添加一个需求时,它会给我错误,您可以在标题中看到。

请让我知道我在做什么错吗?

Luigi完全错误如下:

  

运行时错误:追溯(最近一次呼叫最近):文件   “ C:\ Users \ somepath \ Luigi \ venv \ python \ lib \ site-packages \ luigi \ worker.py”,   182行,运行中       引发RuntimeError('在运行时未完成的%s:%s'%(deps,','。join(missing)))RuntimeError:在运行时未完成的依赖性   Generate_TV_WebScraping_File_X__DataMining_Lu_8213e479cf

以下代码示例的缩进等致歉。粘贴时格式已更改。

我当前的代码如下:

import requests
from bs4 import BeautifulSoup
import re
import pyodbc
import luigi
import arrow

class Generate_TV_WebScraping_File(luigi.ExternalTask):

    input_path = luigi.Parameter('X:/somefilepath/Televised_Football_Staging.csv')

    def ouptut(self):
        return luigi.LocalTarget(self.input_path)

    def run(self):
        ################################ GET DATA FROM WEBSITE ###############################################

        ## set url
        page_link = 'https://www.somewebsite.html'

        ## request access with timout of 5 seconds
        page_response = requests.get(page_link, timeout=5)

        ## BS to parse the html
        page_content = BeautifulSoup(page_response.content, "html.parser")

        ## find all content related to match fixtures div class
        div_team = page_content.findAll('div', attrs={"class":"span4 matchfixture"})

        clean_date = ''

        ## set path and file for data export
        f = open("X:\somefilepath\Televised_Football_Staging.csv", "w")

        ## for all the content in div class 'row-fluid'
        for rows in page_content.findAll('div', attrs={"class":"row-fluid"}):
        ## if the content div class is match date
            if rows.findAll('div', attrs={"class": "span12 matchdate"}):
        ## save it to the variable 'date_row'
              date_row = rows.findAll('div', attrs={"class": "span12 matchdate"})
        ## clean it by removing html tags and comma separate
              concat_rows = ",".join(str(x) for x in date_row)
              clean_date = re.sub("<.*?>", " ", concat_rows)
        ## when it is not a match date in the div class 'row-fluid' and it is the match fixture content
            elif rows.findAll('div', attrs={"class": "span4 matchfixture"}):
        ## clean it by removing html tags and comma separate
                concat_rows = ",".join(str(x) for x in rows)
                clean_rows = re.sub("<.*?>", " ", concat_rows)
        ## print the content and concatenate with date
                f.write('%s\n' % (clean_rows + "," + clean_date))

        ## Close csv
        f.close()

        #######################################################################################################


class Insert_TV_WebScraping_To_Db(luigi.Task):

    def requires(self):
        return Generate_TV_WebScraping_File(self.input_path)

    def ouptut(self):
        sys_date = arrow.now().format('YYYYMMDD')
        return luigi.LocalTarget('X:/somefilepath/tv_webscrape_log_' + sys_date + '.txt')

    def run(self):
        ############################### INSERT DATA INTO DATABASE ###################################################

        ## set sql connection string to DataMiningDev
        cnxn = pyodbc.connect(driver="{SQL Server}", server="someserver", database="somedatabase", autocommit=True)

        ## run sql query
        cursor = cnxn.cursor()
        cursor.execute('EXEC somedatabase.someschema.somedbproc')

        ## being kind
        cnxn.close()

        #############################################################################################################


# Run Luigi Tasks #
#luigi.run(main_task_cls=Generate_TV_WebScraping_File)
luigi.run(main_task_cls=Insert_TV_WebScraping_To_Db)

0 个答案:

没有答案