我有一个.parquet文件,我正在使用PyArrow。 我使用以下代码将.parquet文件转换为表:
bool
执行import pyarrow.parquet as pq
import pandas as pd
filepath = "xxx" # This contains the exact location of the file on the server
from pandas import Series, DataFrame
table = pq.read_table(filepath)
返回table.shape
。
表的架构是:
(39014 rows, 19 columns)
执行col1: int64 not null
col2: string not null
col3: string not null
col4: int64 not null
col5: string not null
col6: string not null
col7: int64 not null
col8: int64 not null
col9: string not null
col10: string not null
col11: string not null
col12: string not null
col13: string not null
col14: string not null
col15: string not null
col16: int64 not null
col17: int64 not null
col18: int64 not null
col19: string not null
时出现以下错误:
ImportError:无法导入名称RangeIndex
如何将此镶木地板文件转换为数据框然后转换为CSV? 请帮忙。谢谢。
答案 0 :(得分:2)
请尝试以下操作:
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import pyodbc
def read_pyarrow(path, nthreads=1):
return pq.read_table(path, nthreads=nthreads).to_pandas()
path = './test.parquet'
df1 = read_pyarrow(path)
df1.to_csv(
'./test.csv',
sep='|',
index=False,
mode='w',
line_terminator='\n',
encoding='utf-8')