dask dataframe set_index抛出错误

时间:2017-10-08 01:19:15

标签: dask dask-distributed

我有一个从HDFS上的镶木地板文件创建的dask数据框。 使用api:set_index创建设置索引时,失败并显示以下错误。

  

文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py”,第64行,在set_index中       分区,大小,分钟,最大= base.compute(分区,大小,分钟,最大)     在计算机中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py”,第206行       results = get(dsk,keys,** kwargs)     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1949行,获取       results = self.gather(packed,asynchronous = asynchronous)     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1391行,收集       异步=异步)     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第561行,同步       返回同步(self.loop,func,* args,** kwargs)     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第241行,同步       six.reraise(*错误[0])     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第693行,重新加入       提高价值     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第229行,f       result [0] = yield make_coro()     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py”,第1055行,在运行中       value = future.result()     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/concurrent.py”,第238行,结果       raise_exc_info(self._exc_info)     在raise_exc_info中的文件“”,第4行     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py”,第1063行,在运行中       yielded = self.gen.throw(* exc_info)     在_gather中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1269行       追溯)     文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第692行,重新加入       提高value.with_traceback(tb)     在_read_parquet_row_group中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第144行       open = open,assign = views,scheme = scheme)   TypeError:read_row_group_file()得到了一个意外的关键字参数'scheme'

有人可以指出这个错误的原因以及如何解决它。

1 个答案:

答案 0 :(得分:2)

解决方案

将fastparquet升级到版本0.1.3。

详细信息

Dask 0.15.4,用于您的示例,包含this commit,它将参数scheme添加到read_row_group_file()。这会在0.1.3之前的fastparquet版本中引发错误。