每当我使用Java在Hadoop中创建新文件并编写内容时,特殊字符都会附加在文件的开头。有没有办法消除?以下是代码
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String extractedXML = writer.getBuffer().toString().replaceAll("\\r$", "");
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
$ hadoop fs -cat /filelocation/input.txt|head -5
)▒hello world
input1
hello again
hello
welcome again
答案 0 :(得分:1)
它对我有用,只需更换下面的行
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
以下代码:
OutputStream os = fs.create( "/filelocation/input.txt", new Progressable() {
public void progress() {
}
});
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write(extractedXML);
br.close();