读取.text文件并排除某些元素

时间:2015-04-11 23:24:11

标签: java

在我完成这个程序的过程中,我遇到了一个方法。我正在编写的方法读取某个.txt文件并创建一个HashMap并将每个单词设置为一个Key,它出现的时间是它的Value。我已经设法解决了另一种方法,但这一次,该方法正在读取的.txt文件是一种奇怪的格式。具体做法是:

more 2
morning's 1
most 3
mostly 1
mythology. 1
native 1
nearly 2
northern 1
occupying 1
of 29
off 1

等等。 现在,该方法只返回文件中的一行。

以下是该方法的代码:

  public static HashMap<String,Integer> readVocabulary(String fileName) {
   // Declare the HashMap to be returned
    HashMap<String, Integer> wordCount = new HashMap();
    String toRead = fileName;

     try {
      FileReader reader = new FileReader(toRead);
      BufferedReader br = new BufferedReader(reader);

      // The BufferedReader reads the lines      
      String line = br.readLine();


      // Split the line into a String array to loop through
      String[] words = line.split(" ");

      // for loop goes through every word
      for (int i = 0; i < words.length; i++) {
        // Case if the HashMap already contains the key.
        // If so, just increments the value.        
        if (wordCount.containsKey(words[i])) { 
          int n = wordCount.get(words[i]);    
          wordCount.put(words[i], ++n);
        }
        // Otherwise, puts the word into the HashMap
        else {
          wordCount.put(words[i], 1);
        }
      }
      br.close();
    }
    // Catching the file not found error
    // and any other errors
    catch (FileNotFoundException fnfe) {
      System.err.println("File not found.");
    }
    catch (Exception e) {
      System.err.print(e);
    }

    return wordCount;
  }

问题是我不知道如何让方法忽略.txt文件的2和1和29。我尝试制作一个'else if'语句来捕获所有这些案例,但是有太多。有没有办法让我从1-100中捕获所有的内容,并将它们排除在HashMap中的Keys之外?我在网上搜索了但发现了一些东西。

感谢您提供任何帮助!

3 个答案:

答案 0 :(得分:1)

在完成拆分后,如何为每一行wordCount.put(words[0],1)wordcount。如果模式始终是“单词编号”,则只需要拆分数组中的第一个项目。


经过一些来回更新

public static HashMap<String,Integer> readVocabulary(String toRead)
{ 
    // Declare the HashMap to be returned 
    HashMap<String, Integer> wordCount = new HashMap<String, Integer>(); 

    String line = null;
    String[] words = null;
    int lineNumber = 0;
    FileReader reader = null;
    BufferedReader br = null;

    try { 
        reader = new FileReader(toRead); 
        br = new BufferedReader(reader); 

        // Split the line into a String array to loop through 
        while ((line = br.readLine()) != null) {
            lineNumber++;
            words = line.split(" "); 
            if (words.length == 2) {

                if (wordCount.containsKey(words[0]))
                { 
                    int n = wordCount.get(words[0]); 
                    wordCount.put(words[0], ++n); 
                } 
                // Otherwise, puts the word into the HashMap 
                else
                {  
                    boolean word2IsInteger = true;
                    try  
                    {  
                        Integer.parseInt(words[1]);
                    } 
                    catch(NumberFormatException nfe)  
                    {  
                        word2IsInteger = false;  
                    }
                    if (word2IsInteger) {
                        wordCount.put(words[0], Integer.parseInt(words[1])); 
                    }
                } 
            }
        } 
        br.close();
        br = null;
        reader.close();
        reader = null;
    } 
    // Catching the file not found error 
    // and any other errors 
    catch (FileNotFoundException fnfe) { 
        System.err.println("File not found."); 
    } 
    catch (Exception e) { 
        System.err.print(e); 
    } 

    return wordCount; 
}

答案 1 :(得分:1)

要检查字符串是否只包含数字,请使用String的matches()方法,例如

if (!words[i].matches("^\\d+$")){
  // NOT a String containing only digits
}

这不需要检查异常,如果数字不适合整数,则无关紧要。

答案 2 :(得分:0)

选项1:忽略以空格分隔的数字

使用Integer.parseInt()或Double.parseInt()并捕获异常。

// for loop goes through every word
  for (int i = 0; i < words.length; i++) {
    try {
       int wordAsInt = Integer.parseInt(words[i]);
    } catch(NumberFormatException e) {
       // Case if the HashMap already contains the key.
       // If so, just increments the value. 
       if (wordCount.containsKey(words[i])) { 
          int n = wordCount.get(words[i]);    
          wordCount.put(words[i], ++n);
       } 
       // Otherwise, puts the word into the HashMap
       else {
          wordCount.put(words[i], 1);
       }
    }
  }

有一种Double.parseDouble(String)方法,如果您想要消除所有数字而不仅仅是整数,可以使用它代替上面的Integer.parseInt(String)

选项2:到处忽略数字

另一种选择是一次解析输入的一个字符,并忽略任何不是字母的字符。扫描空格时,可以将刚刚扫描的字符生成的单词添加到HashMap中。与上面提到的方法不同,按字符扫描将允许您忽略数字,即使它们紧挨着其他字符出现。