如何在pyarrow中使用我们自己的架构从CSV创建拼花文件

时间:2019-06-07 14:05:17

标签: python python-3.x csv parquet pyarrow

我尝试将实木复合地板源文件转换为csv,然后将输出csv再次转换为实木复合地板。在比较原始实木复合地板和生成的实木复合地板时,根据模式的值类型不匹配。与原始来源完全相同的架构?

这是代码:

import pyarrow.parquet as pq
import numpy as np
import pandas as pd
import pyarrow as pa

schema_full=pq.read_schema(source_path+'\\'+onlyfiles[0])
full_df = pd.concat(pd.read_parquet(source_path+'\\'+parquet_file) for 
parquet_file in onlyfiles)
full_df.to_csv(target_path+'\\'+target_filename+'.csv', index=False)

Target_df = pd.read_csv(target_path+'\\'+target_filename+'.csv',index=False)
target_table=pa.Table.from_pandas(Target_df)
pa.Table.cast(target_table,schema_full)
print(target_table)

以下是我得到的错误:

Traceback (most recent call last):
  File "C:\Users\*****\Documents\Python Scripts\parq_ascii_pyarrow.py", line 31, in <module>
    pa.Table.cast(target_table,schema_full)
  File "pyarrow\table.pxi", line 1390, in itercolumns
  File "pyarrow\table.pxi", line 410, in pyarrow.lib.Column.cast
  File "pyarrow\error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: No cast implemented from double to decimal(8, 2)

0 个答案:

没有答案