Spark-SQL Parquet压缩设置似乎不起作用

时间:2019-03-16 18:35:12

标签: apache-spark apache-spark-sql parquet

我在SparkConf中设置了压缩设置,如下所示:

sparkConf.set("spark.sql.parquet.compression.codec", "SNAPPY")

,并且在创建“ SparkSession”之后如下:

 val spark = SparkSession
    .builder()
    .config(sparkConf)
    .config("spark.sql.parquet.compression.codec", "GZIP") //SNAPPY
    .config("spark.io.compression.codec", "org.apache.spark.io.LZ4CompressionCodec")

但是,我在执行程序标准输出中看到以下内容:

Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Dictionary is on
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Validation is off
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
Mar 16, 2019 10:34:17 AM INFO: parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 0

关于输出是:

Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression set to false
Mar 16, 2019 10:34:16 AM INFO: parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED

这是否意味着火花正在将未压缩的数据写入镶木地板?如果没有,我该如何验证?有没有办法查看镶木地板元数据?

0 个答案:

没有答案