Python-读取没有熊猫的实木复合地板文件

时间:2018-06-22 12:32:25

标签: python pandas parquet

当前,我正在使用Python 3.5, Windows上的以下代码来读取parquet文件。

import pandas as pd

parquetfilename = 'File1.parquet'
parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'column2'])  

但是,我希望不使用熊猫。如何做到最好?我正在Python 2.7 and 3.6上同时使用Windows

1 个答案:

答案 0 :(得分:0)

您可以为此使用duckdb。它是类似于SQLite的嵌入式RDBMS,但要考虑到OLAP。有一个不错的Python API和一个SQL函数可以导入Parquet文件:

import duckdb

conn = duckdb.connect(":memory:") # or a file name to persist the DB

# Keep in mind this doesn't support partitioned datasets,
# so you can only read one partition at a time
conn.execute("CREATE TABLE mydata AS SELECT * FROM parquet_scan('/path/to/mydata.parquet')")

# Export a query as CSV
conn.execute("COPY (SELECT * FROM mydata WHERE col = 'val') TO 'col_val.csv' WITH (HEADER 1, DELIMITER ',')")