如何保存csv格式的快速压缩格式pyspark-2.0或更高版本

时间:2018-12-17 00:14:54

标签: python pyspark pyspark-sql

我一直在尝试下面的代码-

riders.write.csv(path="/loudacre/devices4_csv", sep=",", mode="overwrite", compression="snappy")

错误- 18/12/22 13:54:38错误执行程序。执行程序:阶段10.0(TID 10)中的任务0.0中发生异常 java.lang.RuntimeException:本机快照库不可用:此版本的libhadoop是在没有快照支持的情况下构建的。     在org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)     在org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)     在org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)     在org.apache.hadoop.io.compress.CompressionCodec $ Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)     在org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:100)     在org.apache.spark.sql.execution.datasources.CodecStreams $$ anonfun $ createOutputStream $ 1.apply(CodecStreams.scala:84)     在org.apache.spark.sql.execution.datasources.CodecStreams $$ anonfun $ createOutputStream $ 1.apply(CodecStreams.scala:84)     在scala.Option.map(Option.scala:146)     在org.apache.spark.sql.execution.datasources.CodecStreams $ .createOutputStream(CodecStreams.scala:84)     在org.apache.spark.sql.execution.datasources.CodecStreams $ .createOutputStreamWriter(CodecStreams.scala:92)处     在org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter。(CSVFileFormat.scala:177)     在org.apache.spark.sql.execution.datasources.csv.CSVFileFormat $$ anon $ 1.newInstance(CSVFileFormat.scala:85)     在org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)     在org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter。(FileFormatDataWriter.scala:108)     在org.apache.spark.sql.execution.datasources.FileFormatWriter $ .org $ apache $ spark $ sql $ execution $ datasources $ FileFormatWriter $$ executeTask(FileFormatWriter.scala:233)     在org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ write $ 1.apply(FileFormatWriter.scala:169)

1 个答案:

答案 0 :(得分:0)

检查是否已安装snappy。 hadoop checknative -a

如果未安装: yum install snappy snappy-devel

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/install_compression_libraries.html

如果已安装但未被spark选中,则可以将其手动添加到spark-default.conf中。您可以根据自己的位置更改路径。

spark.driver.extraClassPath=/usr/hdp/current/hadoop-client/lib/snappy*.jar
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native