Question

我有一个大数据文件，并以该文件的很小一部分作为Array[Byte]来响应GET请求

指令为：

get {
  dataRepo.load(param).map(data =>
    complete(
      HttpResponse(
        entity = HttpEntity(myContentType, data),
        headers = List(gzipContentEncoding)
      )
    )
  ).getOrElse(complete(HttpResponse(status = StatusCodes.NoContent)))
}

dataRepo.load是一个函数，类似于：

val pointers: Option[Long, Int] = calculateFilePointers(param)
pointers.map { case (index, length) =>
  val dataReader = new RandomAccessFile(dataFile, "r")
  dataReader.seek(index)
  val data = Array.ofDim[Byte](length)
  dataReader.readFully(data)
  data
}

是否有一种更有效的方法将管道中的RandomAccessFile直接读取回去，而不是必须先完全读取它？

Answer 1

您可以创建一个Array[Byte]来一次读取文件的一部分，而不是将数据读入Iterator[Array[Byte]]：

val dataReader = new RandomAccessFile(dataFile, 'r')

val chunkSize = 1024

Iterator
  .range(index, index + length, chunkSize)
  .map {  currentIndex =>
    val currentBytes = 
      Array.ofDim[Byte](Math.min(chunkSize, length - currentIndex))

    dataReader seek currentIndex
    dataReader readFully currentBytes

    currentBytes
  }

此迭代器现在可以提供akka Source：

val source : Source[Array[Byte], _] = 
  Source fromIterator (() => dataRepo.load(param))

然后可以提供HttpEntity：

val byteStrSource : Source[ByteString, _] = source.map(ByteString.apply)

val httpEntity = HttpEntity(myContentType, byteStrSource)

现在，每个客户端一次仅使用1024字节的内存，而不是读取文件的整个长度。这将使您的服务器在处理多个并发请求时效率更高，并且使您的dataRepo.load可以立即返回一个懒惰的Source值，而不是使用Future。

指令完成，读取RandomAccessFile

1 个答案: