我有一个从HDFS上的镶木地板文件创建的dask数据框。 使用api:set_index创建设置索引时,失败并显示以下错误。
文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py”,第64行,在set_index中 分区,大小,分钟,最大= base.compute(分区,大小,分钟,最大) 在计算机中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py”,第206行 results = get(dsk,keys,** kwargs) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1949行,获取 results = self.gather(packed,asynchronous = asynchronous) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1391行,收集 异步=异步) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第561行,同步 返回同步(self.loop,func,* args,** kwargs) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第241行,同步 six.reraise(*错误[0]) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第693行,重新加入 提高价值 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py”,第229行,f result [0] = yield make_coro() 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py”,第1055行,在运行中 value = future.result() 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/concurrent.py”,第238行,结果 raise_exc_info(self._exc_info) 在raise_exc_info中的文件“”,第4行 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py”,第1063行,在运行中 yielded = self.gen.throw(* exc_info) 在_gather中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py”,第1269行 追溯) 文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py”,第692行,重新加入 提高value.with_traceback(tb) 在_read_parquet_row_group中输入文件“/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py”,第144行 open = open,assign = views,scheme = scheme) TypeError:read_row_group_file()得到了一个意外的关键字参数'scheme'
有人可以指出这个错误的原因以及如何解决它。
答案 0 :(得分:2)
将fastparquet升级到版本0.1.3。
Dask 0.15.4,用于您的示例,包含this commit,它将参数scheme
添加到read_row_group_file()
。这会在0.1.3之前的fastparquet版本中引发错误。