使用Pyarrow将.parquet文件转换为CSV

时间:2017-05-05 14:16:14

标签: python pandas parquet bigdata

我有一个.parquet文件,我正在使用PyArrow。 我使用以下代码将.parquet文件转换为表:

bool

执行import pyarrow.parquet as pq import pandas as pd filepath = "xxx" # This contains the exact location of the file on the server from pandas import Series, DataFrame table = pq.read_table(filepath) 返回table.shape

表的架构是:

(39014 rows, 19 columns)

执行col1: int64 not null col2: string not null col3: string not null col4: int64 not null col5: string not null col6: string not null col7: int64 not null col8: int64 not null col9: string not null col10: string not null col11: string not null col12: string not null col13: string not null col14: string not null col15: string not null col16: int64 not null col17: int64 not null col18: int64 not null col19: string not null 时出现以下错误:

  

ImportError:无法导入名称RangeIndex

如何将此镶木地板文件转换为数据框然后转换为CSV? 请帮忙。谢谢。

1 个答案:

答案 0 :(得分:2)

请尝试以下操作:

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import pyodbc

def read_pyarrow(path, nthreads=1):
    return pq.read_table(path, nthreads=nthreads).to_pandas()

path = './test.parquet'
df1 = read_pyarrow(path)

df1.to_csv(
    './test.csv',
    sep='|',
    index=False,
    mode='w',
    line_terminator='\n',
    encoding='utf-8')