使用FilterInputStream删除InputStream的剩余部分

时间:2013-05-19 16:19:52

标签: java yaml java-io

我计划使用Java来处理Markdown文本文件,这些文件在YAML格式的文档开头指定其他元信息,如标题,作者,创建日期等。这是一个例子:

---
title: An example document
author: Paul
created: 2013-05-19
---

The _body_ of this document is
written in **Markdown**.

为了解析YAML数据,我可以使用snakeyaml。据我所知,您可以通过方法java.io.InputStreamjava.io.ReaderStringyaml.load()yaml.loadAll()加载YAML文档(请参阅{ {3}}和the SnakeYAML documentation)。

我不想使用从String读取的版本,因为这会导致大文件出现性能问题。但是将文件作为InputStream使用失败,因为该流不代表有效的YAML文档。只有流的第一部分代表有效文档。

所以我的问题是:如何使用java.io.FilterInputStream / java.io.FilterReader或其他方法生成流,在第二个---之后停止,以便整个流有效YAML?< / p>

2 个答案:

答案 0 :(得分:1)

添加“...”(三个点),您希望YAML解析器停止。

答案 1 :(得分:0)

这是我的解决方案(Scala代码):

import java.io.InputStreamReader
import java.io.InputStream
import java.nio.charset.Charset

import scala.collection.mutable.Queue

/**
 * Reader for Metadata that is contained in the given `InputStream`.
 *
 * @constructor Create a new metadata reader with a given `Charset`.
 * @param in underlying input stream
 * @param charset encoding of the stream
 */
class MetadataReader(in: InputStream, charset: Charset)
    extends InputStreamReader(in, charset) {
  private val lookahead = Queue.empty[Int] // buffer for looking ahead
  private var afterNewline = true // indicates that the last char was a newline
  private var divider = 0 // number of divider characters in a row ('-')

  /**
   * Create new MetadataReader with the systems default `Charset`.
   *
   * @param in underlying input stream
   */
  def this(in: InputStream) = this(in, Charset.defaultCharset())

  /**
   * Read the next character.
   *
   * @return next character
   */
  override def read: Int =
    if (divider == 2) {
      -1
    } else if (!lookahead.isEmpty) {
      lookahead.dequeue
    } else {

      // read next character
      def readNext: Int =
        if (lookahead.length == 3) {
          divider += 1
          read
        } else {
          val c = super.read
          if (c == '-') {
            lookahead.enqueue(c)
            readNext
          } else {
            lookahead.enqueue(c)
            lookahead.dequeue
          }
        }

      readNext
    }

  /**
   * Read characters into a buffer character array.
   *
   * @param buf buffer array
   * @param off offset to start in the array
   * @param len number of characters to read
   * @return actually read characters
   */
  override def read(buf: Array[Char], off: Int, len: Int): Int = {
    var j = 0
    for (i <- 0 until len) {
      val c = read

      if (c == -1)
        return j

      if (i >= off) {
        buf(i) = c.toChar
        j += 1
      }
    }

    j
  }
}

你可以这样使用它:

val yaml = new Yaml
val mr = new MetadataReader(new FileInputStream(
  new File("src/test/resources/yaml-test.txt")), Charset.forName("UTF-8"))
println(yaml.load(mr))
mis.close()

反馈意见。