Question

我有这段代码，阅读文档语料库，并将它们放在java数据结构中。

  public List<List<Integer>> corpus;
    corpus = new ArrayList<List<Integer>>();
    numDocuments = 0;
    numWordsInCorpus = 0;           
    BufferedReader br = null;
    try {
            int indexSentence = -1;
            int indexWord = -1;
            br = new BufferedReader(new FileReader(pathToCorpus));
            for (String doc; (doc = br.readLine()) != null;) {
                if (doc.trim().length() == 0)
                    continue;

                List<List<Integer>> document = new ArrayList<List<Integer>>();
                String [] sentenceStrs = doc.split("\t");
                for(String sentenceStr: sentenceStrs){
                   List<Integer> sntence = new ArrayList<Integer>();

                    indexSentence += 1;
                    String[] words = sentenceStr.trim().split(" ");
                    for (String word : words) {
                        if (word2IdVocabulary.containsKey(word)) {
                            sntence.add(word2IdVocabulary.get(word));
                        }
                        else {
                            indexWord += 1;
                            word2IdVocabulary.put(word, indexWord);
                            id2WordVocabulary.put(indexWord, word);
                            sntence.add(indexWord);
                        }
                    }
                    document.add(sntence);
                    numWordsInCorpus += sntence.size();
                }
                numDocuments++;
                corpus.addAll(document);
            }
    }

我需要获取语料库中的每个文档，并且每个文档都要读取每个单词索引。当语料库得到数千个文档时，如何通过快速运行来遍历这些嵌套列表？

如何遍历Java

0 个答案: