我正在尝试将dask与luigi合并, 虽然业务逻辑本身工作正常,但是当我运行Luigi任务时,代码会开始抛出错误:
raise ValueError('url type not understood: %s' % urlpath)
ValueError: url type not understood: <_io.TextIOWrapper name='../data/2017_04_11_oldsource_geocoded.csv-luigi-tmp-1647603946' mode='wb' encoding='UTF-8'>
代码在这里(我删除了业务模型部分以缩短它):
import pandas as pd
import geopandas as gp
from geopandas.tools import sjoin
from dask import dataframe as dd
from shapely.geometry import Point
from os import path
import luigi
class geocode_tweets(luigi.Task):
boundaries = _load_geoboundaries()
nyc = boundaries[0].unary_union
def requires(self):
return []
def output(self):
self.path = '../data/2017_04_11_oldsource_geocoded.csv'
return luigi.LocalTarget(self.path)
def run(self):
df = dd.read_csv(path.join(data_dir, '2017_03_22_oldsource.csv'))
df['geometry'] = df.apply(_get_point, axis=1)
meta = _form_meta(df)
S = df.map_partitions(
distributed_sjoin, boundaries=self.boundaries,
nyc_border=self.nyc, meta=meta).drop('geometry', axis=1)
f = self.output().open('w')
S.to_csv(f)
f.close()
问题看起来像是在输出部分
据我所知,问题是dask不喜欢Luigi文件对象作为字符串的替换。
答案 0 :(得分:3)
Dask定义DataFrame.to_csv(filename, **kwargs)
,你发送的是文件而不是文件名。用以下内容替换最后三行:
S.to_csv(self.output().path)