在Windows上读取活泼的镶木地板文件会导致python崩溃

时间:2020-06-10 01:49:44

标签: python dask parquet pyarrow

我无法在Windows上通过pyarrow读取活泼的镶木地板文件。

import dask.dataframe as dd
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))
dd_df = dd.from_pandas(df, npartitions=1)
dd_df.to_parquet("my_df.snappy.parquet", engine="pyarrow", compression="snappy")
dd_df_copy = dd.read_parquet("my_df.snappy.parquet", engine="pyarrow")
dd_df_copy.compute() #<--- This is where it crashes

我已经在干净的Anaconda环境中使用Python 3.8复制了此问题。创建环境后,我运行了pip install "dask[complete]"pip install pyarrow

错误是:

Problem signature:
  Problem Event Name:   APPCRASH
  Application Name: python.exe
  Application Version:  3.8.3150.1013
  Application Timestamp:    5ed53446
  Fault Module Name:    arrow.dll
  Fault Module Version: 0.0.0.0
  Fault Module Timestamp:   5ebd3029
  Exception Code:   c000001d
  Exception Offset: 00000000007abfc7
  OS Version:   6.3.9600.2.0.0.16.7
  Locale ID:    1033
  Additional Information 1: d8e4
  Additional Information 2: d8e42c04b828d96accf490cd13472bea
  Additional Information 3: aebe
  Additional Information 4: aebe917bfb5c1b58e884baa1f9c3d3d2

当我尝试使用conda -c conda-forge dask pyarrow时,会获得类似的崩溃版本:

Problem signature:
  Problem Event Name:   APPCRASH
  Application Name: python.exe
  Application Version:  3.8.3150.1013
  Application Timestamp:    5ed53446
  Fault Module Name:    arrow.dll
  Fault Module Version: 0.0.0.0
  Fault Module Timestamp:   5ecf56ac
  Exception Code:   c000001d
  Exception Offset: 0000000000521587
  OS Version:   6.3.9600.2.0.0.16.7
  Locale ID:    1033
  Additional Information 1: e863
  Additional Information 2: e8638a01b9fb70505b0604ef9b98f3c6
  Additional Information 3: 1e47
  Additional Information 4: 1e47c852f479606e071f3ea8f80878a1

1 个答案:

答案 0 :(得分:0)

截至2020年7月1日,更新软件包对此进行了修复。我认为这是pyarrow的更新。