我是python的新手,正在寻求将Luigi实现到我的一些python数据处理脚本中。我有两个任务,一个任务将通过网络抓取一些数据并创建一个csv。下一个任务(取决于第一个任务csv文件)将运行sql服务器proc,以将csv数据转储到数据库中。当我分别运行这些任务时,它们可以正常工作。但是当我添加一个需求时,它会给我错误,您可以在标题中看到。
请让我知道我在做什么错吗?
Luigi完全错误如下:
运行时错误:追溯(最近一次呼叫最近):文件 “ C:\ Users \ somepath \ Luigi \ venv \ python \ lib \ site-packages \ luigi \ worker.py”, 182行,运行中 引发RuntimeError('在运行时未完成的%s:%s'%(deps,','。join(missing)))RuntimeError:在运行时未完成的依赖性 Generate_TV_WebScraping_File_X__DataMining_Lu_8213e479cf
以下代码示例的缩进等致歉。粘贴时格式已更改。
我当前的代码如下:
import requests
from bs4 import BeautifulSoup
import re
import pyodbc
import luigi
import arrow
class Generate_TV_WebScraping_File(luigi.ExternalTask):
input_path = luigi.Parameter('X:/somefilepath/Televised_Football_Staging.csv')
def ouptut(self):
return luigi.LocalTarget(self.input_path)
def run(self):
################################ GET DATA FROM WEBSITE ###############################################
## set url
page_link = 'https://www.somewebsite.html'
## request access with timout of 5 seconds
page_response = requests.get(page_link, timeout=5)
## BS to parse the html
page_content = BeautifulSoup(page_response.content, "html.parser")
## find all content related to match fixtures div class
div_team = page_content.findAll('div', attrs={"class":"span4 matchfixture"})
clean_date = ''
## set path and file for data export
f = open("X:\somefilepath\Televised_Football_Staging.csv", "w")
## for all the content in div class 'row-fluid'
for rows in page_content.findAll('div', attrs={"class":"row-fluid"}):
## if the content div class is match date
if rows.findAll('div', attrs={"class": "span12 matchdate"}):
## save it to the variable 'date_row'
date_row = rows.findAll('div', attrs={"class": "span12 matchdate"})
## clean it by removing html tags and comma separate
concat_rows = ",".join(str(x) for x in date_row)
clean_date = re.sub("<.*?>", " ", concat_rows)
## when it is not a match date in the div class 'row-fluid' and it is the match fixture content
elif rows.findAll('div', attrs={"class": "span4 matchfixture"}):
## clean it by removing html tags and comma separate
concat_rows = ",".join(str(x) for x in rows)
clean_rows = re.sub("<.*?>", " ", concat_rows)
## print the content and concatenate with date
f.write('%s\n' % (clean_rows + "," + clean_date))
## Close csv
f.close()
#######################################################################################################
class Insert_TV_WebScraping_To_Db(luigi.Task):
def requires(self):
return Generate_TV_WebScraping_File(self.input_path)
def ouptut(self):
sys_date = arrow.now().format('YYYYMMDD')
return luigi.LocalTarget('X:/somefilepath/tv_webscrape_log_' + sys_date + '.txt')
def run(self):
############################### INSERT DATA INTO DATABASE ###################################################
## set sql connection string to DataMiningDev
cnxn = pyodbc.connect(driver="{SQL Server}", server="someserver", database="somedatabase", autocommit=True)
## run sql query
cursor = cnxn.cursor()
cursor.execute('EXEC somedatabase.someschema.somedbproc')
## being kind
cnxn.close()
#############################################################################################################
# Run Luigi Tasks #
#luigi.run(main_task_cls=Generate_TV_WebScraping_File)
luigi.run(main_task_cls=Insert_TV_WebScraping_To_Db)