阅读文章并将其标记化

时间:2018-10-08 15:26:59

标签: java token filereader tokenize

目标是读取文件,然后标记文章中的每个单词,然后存储回元素中。因此,稍后我可以将其转换为其他类中的数组,并从中删除一些单词。问题是我不知道它是否正确阅读并标记了该文章。我也不确定在阅读和标记化之后是否应该使用String来存储标记化的文章。

public class Articles{

    private String article;

    public Articles() {

        article = "";
    }

    public String  getArticle(){

        return article;
    }

    public void readArticle( String file) throws Exception{

        BufferedReader br = new BufferedReader(new FileReader(file));
        String words;
        while((words = br.readLine()) != null) {
            article = words;
            getArticle();
        }

        }
    public void tokenize() {
        StringTokenizer strt = new StringTokenizer(article);
        while (strt.hasMoreTokens()) {
                article = strt.nextToken();
                getArticle();
            }
    }


    public void print() {

        System.out.println(article);
    }
}

1 个答案:

答案 0 :(得分:0)

以下是您可以做什么的示例

import java.util.ArrayList;
import java.util.List;

public class Articles{

private String article;
private ArrayList<String> tokens;

public Articles() {
    article = "";
    tokens = new ArrayList<String>();
}

public String  getArticle(){

    return article;
}

public ArrayList getTokens(){

    return tokens;
}

public void readArticle( String file) throws Exception{
    BufferedReader br = new BufferedReader(new FileReader(file));
    String words;
    while((words = br.readLine()) != null) {
        article = words;
        tokenize();
    }

    }
public void tokenize() {
    StringTokenizer strt = new StringTokenizer(article);
    while (strt.hasMoreTokens()) {
            tokens.add(strt.nextToken());
        }
}


public void print() {

    System.out.println(article);
}
}