在Hadoop中的文件开头附加了奇怪的字符

时间:2017-04-29 07:15:18

标签: java hadoop mapreduce

每当我使用Java在Hadoop中创建新文件并编写内容时,特殊字符都会附加在文件的开头。有没有办法消除?以下是代码

TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter writer = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(writer));
        String extractedXML = writer.getBuffer().toString().replaceAll("\\r$", "");
        FSDataOutputStream fin = fs.create("/filelocation/input.txt");
        fin.writeUTF(extractedXML);
        fin.close();


$ hadoop fs -cat /filelocation/input.txt|head -5
)▒hello world
input1
hello again
hello
welcome again

1 个答案:

答案 0 :(得分:1)

它对我有用,只需更换下面的行

FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();

以下代码:

OutputStream os = fs.create( "/filelocation/input.txt",  new Progressable() {
    public void progress() {

    }
 });
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write(extractedXML);
br.close();