有什么方法可以从.tex文件中提取文本?

时间:2019-08-22 06:40:17

标签: latex tex pdflatex

我正在编写一个程序来计算文件中的单词数。解析.tex文件时遇到问题。

此代码需要在网站上运行,该网站必须对正在上传的文件中的单词进行计数。我已经做到了,但是我正在寻找一些更好的解决方案

case "application/x-tex": // Avoid words with '\' and count
            Scanner sc1;
            try {
                sc1 = new Scanner(new URL(URLPath).openStream());
                while (sc1.hasNext()) {
                    String str = sc1.next();
                    if (!str.contains("\\")) {
                        System.out.print(str + " ");
                        wordCount++;
                    }
                }
                sc1.close();
            } catch (IOException e) {
                System.out.println("There was a problem while reading File on the URL");
                break;
//              e.printStackTrace();
            }
            if (wordCount <= 0) {
                System.out.println("Total count is " + wordCount
                        + ". The uploaded File is either empty or it consists of Images only");
            } else {
                System.out.println("");
                System.out.println("**********");
                System.out.println("Word Count: " + wordCount);
                System.out.println("**********");
                System.out.println("");
            }
            break;

我期待一个String输出,我可以进一步使用它来计算单词。

1 个答案:

答案 0 :(得分:0)

//触发perl脚本

URL website = new URL(URLPath);
Path path = Paths.get("myfile.tex");
bufferFiles.add(new File("myFile.tex"));
try (InputStream in = website.openStream()) {
    Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);
}

URL texcount = new URL("https://papertrue.s3.us-west-1.amazonaws.com/draft/77e3c992-b70f-4711-8b9b-eaf390617bb8");
Path path1 = Paths.get("texcount.pl");
bufferFiles.add(new File("texcount.pl"));
try (InputStream in = texcount.openStream()) {
    Files.copy(in, path1, StandardCopyOption.REPLACE_EXISTING);
}

wordCount = 0;
Process process;
try {
    process = Runtime.getRuntime().exec("/etc/papertrue/texcount.pl myfile.tex");
    InputStream is = process.getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    BufferedReader br = new BufferedReader(isr);
    String line;
    while ((line = br.readLine()) != null) {
        String tem[] = line.split(":\\s");
        log.debug(tem[tem.length - 1]);
        try {
            wordCount += Integer.parseInt(tem[tem.length - 1]);
        } catch (Exception e) {

        }
    }

    process.waitFor();
    if (process.exitValue() == 0) {
        log.debug("Command Successful");
    } else {
        log.debug("Command Failure");
    }
    log.debug(wordCount);
} catch (IOException e) {
    log.debug("There was a problem while reading File on the URL");
    e.printStackTrace();
    break;
}