如何在NekoHTML中将Document Object的内容写入String?

时间:2011-04-11 11:08:23

标签: java html-parsing transformer neko

我正在使用NekoHTML来解析某些HTML文件的内容..

除了将文档对象的内容提取到某个字符串外,一切顺利。

我尝试过使用

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(writer);
transformer.transform(source, result);

但似乎没有任何回报。

2 个答案:

答案 0 :(得分:0)

答案 1 :(得分:0)

可行的解决方案:

//this nekohtml  
DOMParser parser = new DOMParser();  
parser.parse(archivo);  


//this xerces  
OutputFormat format = new OutputFormat(parser.getDocument());   
format.setIndenting(true);  

//print xml for console 
//XMLSerializer serializer = new XMLSerializer(System.out, format); 

//save xml in string var 
OutputStream outputStream = new ByteArrayOutputStream(); 
XMLSerializer serializer = new XMLSerializer(outputStream, format); 

//process
serializer.serialize(parser.getDocument()); 


String xmlText = outputStream.toString();  

System.out.println(xmlText); 

//to generate a file output use fileoutputstream instead of system.out 
//XMLSerializer serializer = new XMLSerializer(new FileOutputStream(new File("book.xml")), format);  

网址:http://totheriver.com/learn/xml/xmltutorial.html#6.2

参见 e)将DOM序列化为FileOutputStream以生成xml文件“book.xml”