我正在使用Scala。我需要读取一个大的gzip文件并将其转换为字符串。我需要删除第一行。 这是我阅读文件的方式:
val fis = new FileInputStream(filename)
val gz = new GZIPInputStream(fis)
然后我尝试了这个Source.fromInputStream(gz).getLines.drop(1).mkString("")
。但它会导致内存不足错误。
因此,我想到逐行阅读并将其放入字节数组中。然后我可以在最后将它转换为单个字符串。
但我不知道该怎么做。有什么建议吗?或者也欢迎任何更好的方法。
答案 0 :(得分:2)
如果您的gzip压缩文件很大,可以使用BufferedReader。这是一个例子。它会将所有字符从gzipped文件复制到未压缩文件,但会跳过第一行。
import java.util.zip.GZIPInputStream
import java.io._
import java.nio.charset.StandardCharsets
import scala.annotation.tailrec
import scala.util.Try
val bufferSize = 4096
val pathToGzFile = "/tmp/text.txt.gz"
val pathToOutputFile = "/tmp/text_without_first_line.txt"
val charset = StandardCharsets.UTF_8
val inStream = new FileInputStream(pathToGzFile)
val outStream = new FileOutputStream(pathToOutputFile)
try {
val inGzipStream = new GZIPInputStream(inStream)
val inReader = new InputStreamReader(inGzipStream, charset)
val outWriter = new OutputStreamWriter(outStream, charset)
val bufferedReader = new BufferedReader(inReader)
val closeables = Array[Closeable](inGzipStream, inReader,
outWriter, bufferedReader)
// Read first line, so copy method will not get this - it will be skipped
val firstLine = bufferedReader.readLine()
println(s"First line: $firstLine")
@tailrec
def copy(in: Reader, out: Writer, buffer: Array[Char]): Unit = {
// Copy while it's not end of file
val readChars = in.read(buffer, 0, buffer.length)
if (readChars > 0) {
out.write(buffer, 0, readChars)
copy(in, out, buffer)
}
}
// Copy chars from bufferReader to outWriter using buffer
copy(bufferedReader, outWriter, Array.ofDim[Char](bufferSize))
// Close all closeabes
closeables.foreach(c => Try(c.close()))
}
finally {
Try(inStream.close())
Try(outStream.close())
}