无法将镶木地板文件读取为dask数据框。我能和熊猫一起读书。请提出建议! 我不知道我错过了什么! dask版本== 1.0.0,pyarrow版本== 0.13.0,pandas版本== 0.23.4
Paruet文件样本
UniqueReference DateTime Consumption
0 ABCD 2018-08-01 00:00:00 9
1 EFGH 2018-08-01 01:00:00 0
2 IJKL 2018-08-01 02:00:00 0
3 MNOP 2018-08-01 03:00:00 0
import pyarrow
import dask.dataframe as dd
data = dd.read_parquet('myfile.parquet', engine = 'pyarrow')
错误回溯:
TypeError Traceback (most recent call last)
<ipython-input-22-068eb0627791> in <module>
----> 1 data = dd.read_parquet('myfile.parquet', engine = 'pyarrow').compute()
C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\io\parquet.py in read_parquet(path, columns, filters, categories, index, storage_options, engine, infer_divisions)
1152
1153 return read(fs, fs_token, paths, columns=columns, filters=filters,
-> 1154 categories=categories, index=index, infer_divisions=infer_divisions)
1155
1156
C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\io\parquet.py in _read_pyarrow(fs, fs_token, paths, columns, filters, categories, index, infer_divisions)
685 pandas_metadata = json.loads(schema.metadata[b'pandas'].decode('utf8'))
686 index_names, column_names, storage_name_mapping, column_index_names = (
--> 687 _parse_pandas_metadata(pandas_metadata)
688 )
689 else:
C:\ProgramData\Anaconda3\lib\site-packages\dask\dataframe\io\parquet.py in _parse_pandas_metadata(pandas_metadata)
89 # index name
90 index_names = list(index_storage_names) # make a copy
---> 91 index_storage_names2 = set(index_storage_names)
92 column_names = [name for (storage_name, name)
93 in pairs if name not in index_storage_names2]
TypeError: unhashable type: 'dict'
答案 0 :(得分:0)
如评论中所述,这与几年前的Dask的非常旧的版本有关。对于现代版本,这很好。