有没有一种方法可以配置Apache Tika来分批解析数据? 假设数据分为10个块。 它可以在接收到每个块时对其进行解析吗?还是只能在获取全部10个块时进行解析?
public OutputStream parse(InputStream instream) {
OutputStream outstream = new ByteArrayOutputStream();
ToXMLContentHandler h = new ToXMLContentHandler();
AutoDetectParser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
Metadata metadata = new Metadata();
XHTMLContentHandler h1 = new XHTMLContentHandler(h, metadata);
try {
parser.parse(instream, h1, metadata, context);
outstream.write(h1.toString().getBytes(Charset.forName("UTF-8")));
} catch (TikaException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return outstream;
}
对此有何想法?