我在SparkConf
中设置了压缩设置,如下所示:
sparkConf.set("spark.sql.parquet.compression.codec", "SNAPPY")
,并且在创建“ SparkSession”之后如下:
val spark = SparkSession
.builder()
.config(sparkConf)
.config("spark.sql.parquet.compression.codec", "GZIP") //SNAPPY
.config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")
但是,我在执行程序标准输出中看到以下内容:
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
Mar 16, 2019 10:34:17 AM INFO: parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 0
关于输出是:
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
这是否意味着火花正在将未压缩的数据写入镶木地板?如果没有,我该如何验证?有没有办法查看镶木地板元数据?