如何将大写字母视为文本文件中的缩写

时间:2013-11-13 21:41:47

标签: java drjava

所以我的程序应该读取一个包含推文帖子的文本文件(每行一条推文)。它应该输出主题标签的数量(以#开头的任何单词)和名称标签(以@开头的任何单词),以及困难的部分:它应该检查appreviations(所有大写中不以@或#开头的单词);然后打印缩写和它们的数量。例如; 输入是

OMG roommate @bob drank all the beer...#FML #ihatemondays
lost TV remote before superbowl #FML
Think @bieber is soo hawt...#marryme
seeing @linkinpark & @tswift in 2 weeks...OMG

输出应如下所示:

Analyzing post:
OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag count: 2
Name tag count: 1
Acronyms: OMG 
For a total of 1 acronym(s).

这是我的代码:

import java.io.*; //defines FileNotFoundException
import java.util.Scanner; // import Scanner class

    public class TweetAnalyzer {
    public static void main (String [] args) throws FileNotFoundException{
    //variables
        String tweet;
        Scanner inputFile = new Scanner(new File("A3Q1-input.txt"));

        while (inputFile.hasNextLine())
        {
          tweet = inputFile.nextLine();
          System.out.println("Analyzing post: ");
          System.out.println("\t" + tweet);
          analyzeTweet(tweet);
        }


      }//close main 

      public static void analyzeTweet(String tweet){
        int hashtags = countCharacters(tweet, '#');
        int nametags = countCharacters(tweet, '@');
        System.out.println("Hash tag: " + hashtags);
        System.out.println("Name tag: " + nametags);
        Acronyms(tweet);

      }//close analyzeTweet

      public static int countCharacters(String tweet, char c)//char c represents both @ and # symbols
      {
        int characters = 0;
        char current;
        for(int i=0;i<tweet.length();i++)
        {
          current = tweet.charAt(i);
          if(current == c)
          {
            characters++;
          }
        }
        return characters;
      }

      public static boolean symbol(String tweet, int i) {
        boolean result = true;
        char c;
        if(i-1 >=0)
        {
          c = tweet.charAt(i - 1);
          if (c == '@' || c == '#') {
            result = false;
        }
        }//close if
        else
        {
         result = false;
        }
        return result;
      }

      public static void Acronyms (String tweet){
        char current;
        int capital = 0;
        int j = 0;
        String initials = "";


        for(int i = 0; i < tweet.length(); i++) {
          current = tweet.charAt(i);
          if(symbol(tweet, i) && current >= 'A' && current <= 'Z') {       
            initials += current;
            j = i + 1; 
            current = tweet.charAt(j);
            while(j < tweet.length() && current >= 'A' && current <= 'Z') {
              current = tweet.charAt(j);
              initials += current;
              j++;

            }
            capital++;
            i = j;
            initials += " ";
            }
          else {

            j = i + 1; 
            current = tweet.charAt(j);
            while(j < tweet.length() && current >= 'A' && current <= 'Z') {
              current = tweet.charAt(j);

              j++;

            }

            i = j;

        }
        }
         System.out.println(initials);
         System.out.println("For a total of " + capital + " acronym(s)");
    }//close Acronyms


      }//TweetAnalyzer

除缩写部分外,一切正常。 这是我的输出:

Analyzing post: 
    OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag: 2
Name tag: 1

For a total of 0 acronym(s)
Analyzing post: 
    lost TV remote before superbowl #FML
Hash tag: 1
Name tag: 0

For a total of 0 acronym(s)
Analyzing post: 
    Think @bieber is soo hawt...#marryme
Hash tag: 1
Name tag: 1

For a total of 0 acronym(s)
Analyzing post: 
    seeing @linkinpark & @tswift in 2 weeks...OMG
Hash tag: 0
Name tag: 2
OMG 
For a total of 1 acronym(s)

请帮我修复缩写部分。感谢

4 个答案:

答案 0 :(得分:1)

像这样逐字逐句似乎更自然:

for (String word : tweet.split("\\s+")) {
    if (word.charAt(0) == '@') {
        names++;

    } else if (word.charAt(0) == '#') {
        hashtags++;

    } else if (word.toUpperCase().equals(word)) {
        abbrevs++;
    }
}

答案 1 :(得分:0)

这就是我要做的事情:我将推文分成空格,这样你就有了一个单词列表。然后我扔出包含符号的单词。您可以使用StringUtils.isAlpha。现在,只需检查word.toUpperCase().equals(word)。如果是,那就是没有符号的大写单词。你所谓的首字母缩略词。

答案 2 :(得分:0)

尝试使用此方法获取首字母缩略词:

private static int countAcronyms(String tweet) {
    int acronyms = 0;
    String[] words = tweet.split(" ");

    for (String word : words) {
        if(word.matches("[A-Z]+"))
            acronyms++;
    }

    return acronyms;
}

答案 3 :(得分:0)

使用StringTokenizer分割像这样的空白

StringTokenizer st = new StringTokenizer (yourString);
while(st.hasMoreTokens()) {
   String str = st.nextElement();
   if(str.toUpperCase().equals(str)) {
      abbrvCount++;
   }
}

希望这有帮助。