在HDFS XSL中读取,写入,转换多个XML

时间:2016-05-04 15:32:48

标签: xml parsing csv xslt hdfs

我正在尝试在hdfs中应用样式表,它将xml的String转换为csv。我已将xml存储为字符串数组。该程序与样式表的一个应用程序完美配合。但是,当它遍历循环时,转换失败。它只会将String数组的第一个元素转换为csv。之后我得到以下错误。

public class xmlToCsv {

public void xmlTransform (String[] xmlFiles, String out_hdfsPath, String styleSheet) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException {

        //Set system hdfs config
        Configuration configuration = new Configuration();
        FileSystem hdfs = FileSystem.get(configuration);


        //filesteam for stylesheet application in hdfs
        Path style_path = new Path(styleSheet);
        FSDataInputStream style_stream = hdfs.open(style_path);
        StreamSource stylesource = new StreamSource(style_stream);

        for (int i = 1; i < xmlFiles.length; i++){
            //application of xsl tranformation
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document document = builder.parse(new InputSource(new StringReader(xmlFiles[i])));

            Transformer transformer = TransformerFactory.newInstance().newTransformer(stylesource);
            Source source = new DOMSource(document);

            //output path of csv file
            Path out_path = new Path(out_hdfsPath+i);
            OutputStream os = hdfs.create(out_path);

            Result outputTarget = new StreamResult(os);
            transformer.transform(source, outputTarget);
            os.close();
        }

}

}

ERROR:  'Could not compile stylesheet'
FATAL ERROR:  'Stream closed'
           :Stream closed
Exception in thread "main" javax.xml.transform.TransformerConfigurationException: Stream closed
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:1015)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:789)
        at hdfs_rw.xmlToCsv.xmlTransform(xmlToCsv.java:55)
        at hdfs_rw.hdfs_rw.main(hdfs_rw.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Stream closed
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:839)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
        at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.Parser.parse(Parser.java:431)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.Parser.parse(Parser.java:506)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.XSLTC.compile(XSLTC.java:466)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.XSLTC.compile(XSLTC.java:568)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:974)
        ... 9 more
java.io.IOException: Stream closed
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:839)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
        at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.Parser.parse(Parser.java:431)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.Parser.parse(Parser.java:506)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.XSLTC.compile(XSLTC.java:466)
        at com.sun.org.apache.xalan.internal.xsltc.compiler.XSLTC.compile(XSLTC.java:568)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:974)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:789)
        at hdfs_rw.xmlToCsv.xmlTransform(xmlToCsv.java:55)
        at hdfs_rw.hdfs_rw.main(hdfs_rw.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

1 个答案:

答案 0 :(得分:1)

移动线

        Transformer transformer = TransformerFactory.newInstance().newTransformer(stylesource);

超出for循环。