我无法在Windows上通过pyarrow读取活泼的镶木地板文件。
import dask.dataframe as dd
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
dd_df = dd.from_pandas(df, npartitions=1)
dd_df.to_parquet("my_df.snappy.parquet", engine="pyarrow", compression="snappy")
dd_df_copy = dd.read_parquet("my_df.snappy.parquet", engine="pyarrow")
dd_df_copy.compute() #<--- This is where it crashes
我已经在干净的Anaconda环境中使用Python 3.8复制了此问题。创建环境后,我运行了pip install "dask[complete]"
和pip install pyarrow
错误是:
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 3.8.3150.1013
Application Timestamp: 5ed53446
Fault Module Name: arrow.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5ebd3029
Exception Code: c000001d
Exception Offset: 00000000007abfc7
OS Version: 6.3.9600.2.0.0.16.7
Locale ID: 1033
Additional Information 1: d8e4
Additional Information 2: d8e42c04b828d96accf490cd13472bea
Additional Information 3: aebe
Additional Information 4: aebe917bfb5c1b58e884baa1f9c3d3d2
当我尝试使用conda -c conda-forge dask pyarrow
时,会获得类似的崩溃版本:
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 3.8.3150.1013
Application Timestamp: 5ed53446
Fault Module Name: arrow.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 5ecf56ac
Exception Code: c000001d
Exception Offset: 0000000000521587
OS Version: 6.3.9600.2.0.0.16.7
Locale ID: 1033
Additional Information 1: e863
Additional Information 2: e8638a01b9fb70505b0604ef9b98f3c6
Additional Information 3: 1e47
Additional Information 4: 1e47c852f479606e071f3ea8f80878a1
答案 0 :(得分:0)
截至2020年7月1日,更新软件包对此进行了修复。我认为这是pyarrow
的更新。