将句子分成两个字符串并迭代显示

时间:2013-06-10 08:31:13

标签: java

所以我有一长串这样的单词,基于第一个空格,我希望将单词分成单词意义。基本上我正在使用Apache POI,因为我必须阅读docx文件,然后从中获取数据。

    abash  humiliate, embarrass
    abdicate  relinquish power or position
    aberrant  abnormal
    abet  aid, encourage (typically of crime)
    abeyance  postponement
    aboriginal  indigenous 
    abridge  shorten
    abstemious  moderate
...

那么正则表达式适合我的目的,以便我可以显示它:

word :abash
meaning : humiliate, embarrass
...

我的代码是:

public class WordFileReader {

    /**
     * @param args
     */
    public static void main(String[] args) {
         try {
                FileInputStream fis = new FileInputStream("E:\\important.docx");
                org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
                System.out.print(oleTextExtractor.getText());            
            } catch (Exception e) {
                    e.printStackTrace();
            }

    }

}

- Edit-- 基于建议的答案,我正在使用此

public static void main(String[] args) {
         try {
                FileInputStream fis = new FileInputStream("E:\\Words.docx");
                org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
                //System.out.print(oleTextExtractor.getText());

                Scanner sc = new Scanner(oleTextExtractor.getText());            
                while(sc.hasNextLine()) {
                 String line = sc.nextLine();
                 int i = line.indexOf(' ');
                 String word = line.substring(0, i);
                 String meaning = line.substring(i).trim();

                 System.out.println("word "+word);
                 System.out.println("meaning "+meaning);
                }

            } catch (Exception e) {
                    e.printStackTrace();
            }

    }

但是我得到了

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(Unknown Source)
    at WordFileReader.main(WordFileReader.java:25)

4 个答案:

答案 0 :(得分:3)

我会使用java.util.Scanner从文本中提取行

Scanner sc = new Scanner(oleTextExtractor.getText());            
while(sc.hasNextLine()) {
    String line = sc.nextLine();
    ...

然后我会把这一行分成单词和含义

 int i = line.indexOf(' ', 2);  // start from pos 2 to avoid a article
 String word = txt.substring(0, i);
 String meaning = txt.substring(i).trim();

 String[] parts = line.split("(?<!^a)\\s+", 2);
 String word = parts[0];
 String meaning = parts[1];

答案 1 :(得分:1)

使用java.lang.String.split(String regex, int limit)

String[] parts = line.split("\\s", 1)
String word = parts[0];
String meaning = parts[1];

答案 2 :(得分:0)

您可以按如下方式使用子字符串:

int index = line.indexOf(" ");

“word:”+ line.substring(0,index)+“\ n含义:”+ line.substring(index + 1)

答案 3 :(得分:0)

下面的代码对我来说很好..我使用BufferedReader从文件中读取文本。

BufferedReader br=null;
    try {
        br = new BufferedReader(new FileReader("C:\\test.txt"));
    } catch (FileNotFoundException ex) {
        Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
    }
try {
    StringBuilder sb = new StringBuilder();
    String line="";
    String [] parts=null;
    String everything="",word="",meaning="";
        try {
            line = br.readLine();
        } catch (IOException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }

    while (line != null) {
        sb.append(line);

        parts= line.split(" ",2);
        word=parts[0];
        meaning=parts[1];

    System.out.println("word:"+word.toString());
    System.out.println("meaning:"+meaning.toString());

        sb.append("\n");
            try {
                line = br.readLine();
            } catch (IOException ex) {
                Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
            }
    }

} finally {
        try {
            br.close();

        } catch (IOException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }
}