如何读取Spark日志文件? .iz4或.snappy

时间:2019-01-23 14:58:55

标签: python apache-spark snappy lz4

我想阅读一些日志,但不能。到目前为止,我已经尝试过:

  • hadoop fs -text <file>

但是我唯一得到的是:INFO compress.CodecPool: Got brand-new decompressor [.lz4](与.snappy相同)

  • val rawRdd = spark.sparkContext.sequenceFile[BytesWritable, String](<file>)

它返回我<file> is not a SequenceFile

  • val rawRdd = spark.read.textFile(<file>)

在这种情况下,java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

  • 将文件下载到本地文件系统,然后使用lz4 -d <file>解压缩并尝试查看内容

  • 我关注了this SO post

with open (snappy_file, "r") as input_file: data = input_file.read() decompressor = snappy.hadoop_snappy.StreamDecompressor() uncompressed = decompressor.decompress(data)

但是当我想print(uncompressed)时,我只会得到' 'b

0 个答案:

没有答案