我使用pdfbox将pdf转换为txt,但我在一个文件夹中有多个文件需要在不同的txt文件中创建。我的源代码是
public class PDFconversion
{
public static void main(final String[] args) throws IOException,SAXException, TikaException
{
//Assume sample.txt is in your current directory
File file = new File("sourcefile");
//parse method parameters
FileInputStream inputstream = new FileInputStream(file);
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
metadata.set("org.apache.tika.parser.pdf.sortbyposition", "true");
ParseContext pcontext = new ParseContext();
PDFParser pdfparser = new PDFParser();
System.out.println("Parsing PDF to TEXT...");
pdfparser.parse(inputstream, handler, metadata, pcontext);
FileWriter fw=new FileWriter("targetfile");
//parsing the file
fw.write(handler.toString().trim());
//System.out.println("Contents of the document:" + handler.toString());
}
}
答案 0 :(得分:1)
'java -jar tika-app.jar -t -i #input_dir#-o #output_dir#'怎么样?这将调用批处理模式,该模式将完整目录转换为带有.txt文件的镜像目录....或带有'-J'选项的.json文件