所以我有一长串这样的单词,基于第一个空格,我希望将单词分成单词意义。基本上我正在使用Apache POI
,因为我必须阅读docx文件,然后从中获取数据。
abash humiliate, embarrass
abdicate relinquish power or position
aberrant abnormal
abet aid, encourage (typically of crime)
abeyance postponement
aboriginal indigenous
abridge shorten
abstemious moderate
...
那么正则表达式适合我的目的,以便我可以显示它:
word :abash
meaning : humiliate, embarrass
...
我的代码是:
public class WordFileReader {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\important.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
- Edit-- 基于建议的答案,我正在使用此
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\Words.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
//System.out.print(oleTextExtractor.getText());
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
int i = line.indexOf(' ');
String word = line.substring(0, i);
String meaning = line.substring(i).trim();
System.out.println("word "+word);
System.out.println("meaning "+meaning);
}
} catch (Exception e) {
e.printStackTrace();
}
}
但是我得到了
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at WordFileReader.main(WordFileReader.java:25)
答案 0 :(得分:3)
我会使用java.util.Scanner从文本中提取行
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
...
然后我会把这一行分成单词和含义
int i = line.indexOf(' ', 2); // start from pos 2 to avoid a article
String word = txt.substring(0, i);
String meaning = txt.substring(i).trim();
或
String[] parts = line.split("(?<!^a)\\s+", 2);
String word = parts[0];
String meaning = parts[1];
答案 1 :(得分:1)
使用java.lang.String.split(String regex, int limit)
:
String[] parts = line.split("\\s", 1)
String word = parts[0];
String meaning = parts[1];
答案 2 :(得分:0)
您可以按如下方式使用子字符串:
int index = line.indexOf(" ");
“word:”+ line.substring(0,index)+“\ n含义:”+ line.substring(index + 1)
答案 3 :(得分:0)
下面的代码对我来说很好..我使用BufferedReader从文件中读取文本。
BufferedReader br=null;
try {
br = new BufferedReader(new FileReader("C:\\test.txt"));
} catch (FileNotFoundException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
try {
StringBuilder sb = new StringBuilder();
String line="";
String [] parts=null;
String everything="",word="",meaning="";
try {
line = br.readLine();
} catch (IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
while (line != null) {
sb.append(line);
parts= line.split(" ",2);
word=parts[0];
meaning=parts[1];
System.out.println("word:"+word.toString());
System.out.println("meaning:"+meaning.toString());
sb.append("\n");
try {
line = br.readLine();
} catch (IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
} finally {
try {
br.close();
} catch (IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}