我正在尝试使用Spark Streaming解析流数据。我从kafka接收输入数据然后转换为 JavaPairInputDStream - > RDD 即可。我的RDD就像:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
我逐行获取数据。接下来我正在尝试使用Stax解析器进行解析。这是我的代码:
XMLStreamReader reader;
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream in = IOUtils.toInputStream(items._2, "UTF-8");
reader = factory.createXMLStreamReader(in);
当我尝试这样的时候,我正在
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,7]
Message: XML document structures must start and end within the same entity.
当我尝试这样的时候
reader = factory.createXMLStreamReader(new FileReader(items._2));
我正在
16/12/27 15:17:26 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.io.FileNotFoundException: <note> (No such file or directory)