Question

我有这个方法接收pdfText（它是一个包含解析后pdf文件中的文本的字符串）和fileName，它是我要写这个文本的文件

但是现在我需要在本文中找到“关键字”这个词，并且只提取它后面的单词，这些单词位于同一行（直到换行符）。

例如，我有一个文本，其中包含以下行

标题：东西

“关键字：计算机，机器人，课程”

标签：TAG1，TAG2，TAG3

结果应该是以下列表 [“计算机”，“机器人”，“课程”]。

解决问题

所以我已经搜索了如何解决我的问题......有一个解决方案，不是很聪明但它有效：

            //index of first appearence of the word 
            int index = pdfText.indexOf("Keywords");

            //string from that to the end
            String subStr = pdfText.substring(index);


            //index of first appearence of the new line in the new string
            int index1 = subStr.indexOf("\n");


            //the string we need
            String theString = subStr.substring(9,index1); 

            System.out.println(theString);

            //write in the file..use true as parameter for appending text,not overwrite it
            FileWriter pw = new FileWriter(fileName,true);
            pw.write(theString);

            pw.close();

Answer 1

老实说，这个问题太具体了。无论如何：）

写入文件

String pdfText = "pdfText";
String fileLocation = "fileLocation";
Writer writer = null;
try {
    writer = new BufferedWriter(new OutputStreamWriter(
            new FileOutputStream(fileLocation), "utf-8"));
    writer.write(pdfText);     // String you want to write (i.e. pdfText)
} catch (IOException ioe) {
    ioe.printStackTrace();
} finally {
    try {writer.close();} catch (Exception ex) { ex.printStackTrace(); }
}

指定编码类型总是一个好主意。（ “UTF-8”）。但是你的任务可能并不重要。您可能还需要附加到文件，而不是完全重写它，在这种情况下，您应该为FileOutputStream使用不同的构造函数new FileOutputStream(getFileLocation(), true)。至于许多try / catch块，请不要按照我的例子。这是我设法关闭我的资源的方式，因为eclipse建议哈哈。

解析字符串 如果您有一行"Keywords : Computers, Robots, Course"，

String str = "Keywords : Computers, Robots, Course";
String[] array = str.substring(indexOf(':') + 1).split(",");
//this array = ["Computers", "Robots", "Course"]

现在你有了一个数组，你可以通过它来编写/打印出你想要的数据。

Answer 2

您可以使用regex提取单词“Keyword：”之后的单词，如下所示：

String regex = ".*Keywords\\s*:(.*)\\n.*";

String extractedLine = yourText.replaceAll( regex, "$1" );

System.out.println( extractedLine );

找到一个字符串并返回它后面的单词

2 个答案: