我需要从其他程序生成的文件中删除非xml标记。
文件是这样的:
Executing Command - Blah.exe ...
-----Command Output-----
HTTP/1.1 200 OK
Connection: close
Content-Type: text/xml
<?xml version="1.0"?>
<testResults>
<finalCounts>
<right>7</right>
<wrong>4</wrong>
<ignores>0</ignores>
<exceptions>0</exceptions>
</finalCounts>
</testResults>
Exit-Code: 15
如何在java中轻松删除非xml文本?
答案 0 :(得分:8)
// getContent() returns the complete text to strip.
//
String s = getContent();
// Find the start of the XML content using the <?xml prefix.
//
int xmlIndex = s.indexOf( "<?xml" );
// Strip the non-XML header.
//
s = s.substring( xmlIndex );
// Find the last closing angle-bracket; should indicate end of the XML.
//
xmlIndex = s.lastIndexOf( ">" );
// Strip everything after the closing angle-bracket.
//
s = s.substring( 0, xmlIndex );
答案 1 :(得分:4)
这看起来像直接的HTTP输出......所以只扫描前两个连续换行符(可能在它们前面有回车符)将为您提供要过滤掉的前缀的结尾。