给定文本文件的前N个字节,如何确定这些字节是否为XML?

时间:2016-10-09 15:30:22

标签: java xml parsing

给定byte[] peek,其中peek为N字节来自文本文件,如何确定peek是否为XML?

仅仅检查字符串开头的<是否足够?

2 个答案:

答案 0 :(得分:2)

要确定,给定字符串是否具有XML格式,您需要一个解析器(对于Java,请阅读this)。这是获得确切答案的唯一方法。

检查前几个字节,以便查找<?xml只能给出假设,无论它是否是有效的XML。但是,在你将其解析到最后之前,你不能完全确定。

答案 1 :(得分:2)

According to the XML standard, your files should use <?xml to make it possible to tell if they are XML. If you have chosen not to follow that recommendation, there is no reliable way to tell. Some non-XML files will pass any test (by starting with <) that looks at small-N bytes. Others won't. Also note that a valid XML file may begin with a Unicode BOM character, so be sure to take that into account if you are going to go ahead and try this.