我们的镶木地板文件存储在aws S3存储桶中,并由SNAPPY压缩。 我能够使用python fastparquet模块读取未压缩版本的镶木地板文件,但不能读取压缩版本。
这是我用于未压缩的代码
s3 = s3fs.S3FileSystem(key='XESF', secret='dsfkljsf')
myopen = s3.open
pf = ParquetFile('sample/py_test_snappy/part-r-12423423942834.parquet', open_with=myopen)
df=pf.to_pandas()
返回没有错误,但是当我尝试读取文件的snappy压缩版本时:
pf = ParquetFile('sample/py_test_snappy/part-r-12423423942834.snappy.parquet', open_with=myopen)
我收到to_pandas()
的错误df=pf.to_pandas()
错误消息
KeyErrorTraceback(最近一次调用最后一次) in() ----> 1 df = pf.to_pandas()
/opt/conda/lib/python3.5/site-packages/fastparquet/api.py in to_pandas(自我,列,类别,过滤器,索引) 293 for views(item,v)in views.items()} 294 self.read_row_group(rg,columns,categories,infile = f, - > 295 index = index,assign = parts) 296开始+ = rg.num_rows 297否则:
/opt/conda/lib/python3.5/site-packages/fastparquet/api.py in read_row_group(self,rg,columns,categories,infile,index,assign) 151 core.read_row_group( 152 infile,rg,columns,categories,self.helper,self.cats, - > 153 self.selfmade,index = index,assign = assign) 154如果退回: 155返回df
/opt/conda/lib/python3.5/site-packages/fastparquet/core.py in read_row_group(文件,rg,列,类别,schema_helper,cats, 自制,索引,分配) 300引发RuntimeError('Going with pre-allocation!') 301 read_row_group_arrays(文件,rg,列,类别,schema_helper, - > 302只猫,自制,分配=分配) 303 猫猫用304:
/opt/conda/lib/python3.5/site-packages/fastparquet/core.py in read_row_group_arrays(file,rg,columns,categories,schema_helper, 猫,自制,分配) 289 read_col(column,schema_helper,file,use_cat = use, 290 selfmade = selfmade,assign = out [name], - > 291 catdef = out [name +' - catdef']如果使用其他无) 292 293
/opt/conda/lib/python3.5/site-packages/fastparquet/core.py in read_col(column,schema_helper,infile,use_cat,grab_dict,selfmade, 分配,catdef) 196 dic =无 197如果ph.type == parquet_thrift.PageType.DICTIONARY_PAGE: - > 198 dic = np.array(read_dictionary_page(infile,schema_helper,ph,cmd)) 199 ph = read_thrift(infile,parquet_thrift.PageHeader) 200 dic = convert(dic,se)
/opt/conda/lib/python3.5/site-packages/fastparquet/core.py in read_dictionary_page(file_obj,schema_helper,page_header, column_metadata) 152使用纯编码使用数据并返回值数组。 153“”“ - > 154 raw_bytes = _read_page(file_obj,page_header,column_metadata) 155如果column_metadata.type == parquet_thrift.Type.BYTE_ARRAY: 156#没有更快的方式来读取变长字符串?
/opt/conda/lib/python3.5/site-packages/fastparquet/core.py in _read_page(file_obj,page_header,column_metadata) 28“”“从给定的文件对象中读取数据页面并将其转换为原始的未压缩字节(如有必要)。”“” 29 raw_bytes = file_obj.read(page_header.compressed_page_size) ---> 30 raw_bytes = decompress_data(raw_bytes,column_metadata.codec) 31 32断言len(raw_bytes)== page_header.uncompressed_page_size,\
/opt/conda/lib/python3.5/site-packages/fastparquet/compression.py in decompress_data(数据,算法) 48 def decompress_data(data,algorithm ='gzip'): 49如果isinstance(algorithm,int): ---> 50 algorithm = rev_map [algorithm] 51如果algorithm.upper()不在解压缩中: 52引发RuntimeError(“解压缩'%s'不可用。选项:%s”%
KeyError:1
答案 0 :(得分:14)
该错误可能表示在您的系统上找不到用于解压缩SNAPPY的库 - 尽管显然错误消息可能更清楚!
根据您的系统,以下行可能会为您解决此问题:
typedef enum {
COMMON = 0,
STRINGS,
KEY,
PRECUSSIVE,
GUITAR,
KEYBOARD,
BASS,
PIANO,
DRUMS,
_INST_MAX
} instrument_classification_t;
static const int * const instrument_class_hierarchy[] = {
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
[STRINGS] = {GUITAR, BASS, _INST_MAX},
[KEY] = {PIANO, KEYBOARD, _INST_MAX},
[PRECUSSIVE] = {DRUMS, _INST_MAX},
[GUITAR] = NULL,
[KEYBOARD] = NULL,
[BASS] = NULL,
[PIANO] = NULL,
[DRUMS] = NULL
};
或
main.c:166:3: warning: braces around scalar initializer
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
^
main.c:166:3: note: (near initialization for 'instrument_class_hierarchy[0]')
main.c:166:15: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
^
main.c:166:15: note: (near initialization for 'instrument_class_hierarchy[0]')
main.c:166:24: warning: excess elements in scalar initializer
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
^
main.c:166:24: note: (near initialization for 'instrument_class_hierarchy[0]')
main.c:166:29: warning: excess elements in scalar initializer
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
^
main.c:166:29: note: (near initialization for 'instrument_class_hierarchy[0]')
main.c:166:41: warning: excess elements in scalar initializer
[COMMON] = {STRINGS, KEY, PRECUSSIVE, _INST_MAX},
^
main.c:166:41: note: (near initialization for 'instrument_class_hierarchy[0]')
main.c:167:3: warning: braces around scalar initializer
[STRINGS] = {GUITAR, BASS, _INST_MAX},
^
main.c:167:3: note: (near initialization for 'instrument_class_hierarchy[1]')
main.c:167:16: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
[STRINGS] = {GUITAR, BASS, _INST_MAX},
^
main.c:167:16: note: (near initialization for 'instrument_class_hierarchy[1]')
main.c:167:24: warning: excess elements in scalar initializer
[STRINGS] = {GUITAR, BASS, _INST_MAX},
^
main.c:167:24: note: (near initialization for 'instrument_class_hierarchy[1]')
main.c:167:30: warning: excess elements in scalar initializer
[STRINGS] = {GUITAR, BASS, _INST_MAX},
^
main.c:167:30: note: (near initialization for 'instrument_class_hierarchy[1]')
main.c:168:3: warning: braces around scalar initializer
[KEY] = {PIANO, KEYBOARD, _INST_MAX},
^
main.c:168:3: note: (near initialization for 'instrument_class_hierarchy[2]')
main.c:168:12: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
[KEY] = {PIANO, KEYBOARD, _INST_MAX},
^
main.c:168:12: note: (near initialization for 'instrument_class_hierarchy[2]')
main.c:168:19: warning: excess elements in scalar initializer
[KEY] = {PIANO, KEYBOARD, _INST_MAX},
^
main.c:168:19: note: (near initialization for 'instrument_class_hierarchy[2]')
main.c:168:29: warning: excess elements in scalar initializer
[KEY] = {PIANO, KEYBOARD, _INST_MAX},
^
main.c:168:29: note: (near initialization for 'instrument_class_hierarchy[2]')
main.c:169:3: warning: braces around scalar initializer
[PRECUSSIVE] = {DRUMS, _INST_MAX},
^
main.c:169:3: note: (near initialization for 'instrument_class_hierarchy[3]')
main.c:169:19: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
[PRECUSSIVE] = {DRUMS, _INST_MAX},
^
main.c:169:19: note: (near initialization for 'instrument_class_hierarchy[3]')
main.c:169:26: warning: excess elements in scalar initializer
[PRECUSSIVE] = {DRUMS, _INST_MAX},
如果您使用的是Windows,则构建链可能无法运行,也许您需要从here进行安装。