Question

我有一个xml文件，它有一个非常大的文本节点（＆gt; 10 MB）。在读取文件时，是否可以跳过（忽略）此节点？

我尝试了以下内容：

 reader = XML::Reader.io(path)
 while reader.read do
  next if reader.name.eql?('huge-node')
 end

但这仍会导致错误parser error : xmlSAX2Characters: huge text node

我能想到的唯一其他解决方案是首先将文件作为字符串读取并通过gsub删除巨大的节点，然后解析文件。但是，这种方法似乎效率很低。

Answer 1

这可能是因为当你试图跳过它时，它已经读取了节点。根据{{1}}方法的documentation：

#read

在调用reader.read -> nil|true|false Causes the reader to move to the next node in the stream, exposing its properties. Returns true if a node was successfully read or false if there are no more nodes to read. On errors, an exception is raised.方法之前，您需要跳过该节点。我确信有很多方法可以做到这一点，但它看起来不像这个库支持XPath表达式，或者我会建议类似的东西。

编辑：澄清了这个问题，以便SAX解析器是解决方案的必需部分。我已经删除了在这种约束下无用的链接。

Answer 2

您不必跳过该节点。原因是因为版本2.7.3 libxml将单个文本节点的最大大小限制为10MB。可以使用新选项XML_PARSE_HUGE删除此限制。

Bellow一个例子：

# Reads entire file into a string
$result = file_get_contents("https://www.ncbi.nlm.nih.gov/gene/68943?report=xml&format=text");
# Returns the xml string into an object
$xml = simplexml_load_string($result, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);

Ruby LibXML跳过大型节点

2 个答案: