我想通过令牌读取file.txt
文件令牌中的单词,并为每个单词添加一个词性标记并将其写入file2.text
文件。 file.txt
内容已标记化。所以这是我的代码。
public class PoSTagging {
@SuppressWarnings("resource")
public static void PoStagMethod() throws IOException {
FileInputStream fin= new FileInputStream("C:\\Users\\dell\\Desktop\\file.txt");
DataInputStream in = new DataInputStream(fin);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strline=br.readLine();
System.out.println(strline+"first");
try{
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);
String input = strline;
@SuppressWarnings("deprecation")
ObjectStream<String> lineStream =new PlainTextByLineStream(new StringReader(input));
perfMon.start();
String line;
while ((line = lineStream.read()) != null) {
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
System.out.println(sample.toString()+"second");
//String t=sample.toString();
FileOutputStream fout=new FileOutputStream("C:\\Users\\dell\\Desktop\\file2.txt");
//fout.write(t.getBytes());
perfMon.incrementCounter();
fout.close();
}
perfMon.stopAndPrintFinalResult();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
从另一个类调用PoStagMethod()
时,只有file.txt
文件中的第一个单词会写入file2.txt
文件。为什么不读取文件中的其他单词?我的代码出了什么问题?
答案 0 :(得分:1)
您可以使用BufferedReader
逐行阅读POSModel
。然后按照您的file2.txt
处理每一行,然后使用BufferedWriter
将输出写入 POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));
BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
String line = "";
while((line = bufferedReader.readLine()) != null){
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
// Do your work with your tags and tokenized words
bufferedWriter.write(/* the string which is needed to be written to your output */);
// for adding new-lines in the output file, uncomment the following line:
//bufferedWriter.newLine();
}
//Do not forget to flush() and close() the streams after your job is done:
bufferedWriter.flush();
bufferedWriter.close();
bufferedReader.close();
。下面的代码段可能会有所帮助:
POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
POSTaggerME tagger = new POSTaggerME(model);
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("C:\\Users\\dell\\Desktop\\file2.txt"));
BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\Users\\dell\\Desktop\\file.txt"));
String line = "";
while((line = bufferedReader.readLine()) != null){
String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] tags = tagger.tag(whitespaceTokenizerLine);
for(String word: whitespaceTokenizerLine){
// Do your work with your tags and tokenized words
bufferedWriter.write(/* the string which is needed to be written to your output */);
// for adding new-lines in the output file, uncomment the following line:
//bufferedWriter.newLine();
}
}
//Do not forget to flush() and close() the streams after your job is done:
bufferedWriter.flush();
bufferedWriter.close();
bufferedReader.close();
如果你可以做到这一点,那么用在Java 1.7中添加的 try-with-resource 替换旧式的try-catch子句来自动关闭资源也不错。
此外,如果您需要在单独的行中编写每个单词及其标签,您可能希望有一个内部循环来写入文件。它将如下所示:
@Primary
@Bean(name="oneOfManyDataSources")
public DataSource dataSource() { ... }
希望这会有所帮助,
祝你好运。