所以我的程序应该读取一个包含推文帖子的文本文件(每行一条推文)。它应该输出主题标签的数量(以#开头的任何单词)和名称标签(以@开头的任何单词),以及困难的部分:它应该检查appreviations(所有大写中不以@或#开头的单词);然后打印缩写和它们的数量。例如; 输入是
OMG roommate @bob drank all the beer...#FML #ihatemondays
lost TV remote before superbowl #FML
Think @bieber is soo hawt...#marryme
seeing @linkinpark & @tswift in 2 weeks...OMG
输出应如下所示:
Analyzing post:
OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag count: 2
Name tag count: 1
Acronyms: OMG
For a total of 1 acronym(s).
这是我的代码:
import java.io.*; //defines FileNotFoundException
import java.util.Scanner; // import Scanner class
public class TweetAnalyzer {
public static void main (String [] args) throws FileNotFoundException{
//variables
String tweet;
Scanner inputFile = new Scanner(new File("A3Q1-input.txt"));
while (inputFile.hasNextLine())
{
tweet = inputFile.nextLine();
System.out.println("Analyzing post: ");
System.out.println("\t" + tweet);
analyzeTweet(tweet);
}
}//close main
public static void analyzeTweet(String tweet){
int hashtags = countCharacters(tweet, '#');
int nametags = countCharacters(tweet, '@');
System.out.println("Hash tag: " + hashtags);
System.out.println("Name tag: " + nametags);
Acronyms(tweet);
}//close analyzeTweet
public static int countCharacters(String tweet, char c)//char c represents both @ and # symbols
{
int characters = 0;
char current;
for(int i=0;i<tweet.length();i++)
{
current = tweet.charAt(i);
if(current == c)
{
characters++;
}
}
return characters;
}
public static boolean symbol(String tweet, int i) {
boolean result = true;
char c;
if(i-1 >=0)
{
c = tweet.charAt(i - 1);
if (c == '@' || c == '#') {
result = false;
}
}//close if
else
{
result = false;
}
return result;
}
public static void Acronyms (String tweet){
char current;
int capital = 0;
int j = 0;
String initials = "";
for(int i = 0; i < tweet.length(); i++) {
current = tweet.charAt(i);
if(symbol(tweet, i) && current >= 'A' && current <= 'Z') {
initials += current;
j = i + 1;
current = tweet.charAt(j);
while(j < tweet.length() && current >= 'A' && current <= 'Z') {
current = tweet.charAt(j);
initials += current;
j++;
}
capital++;
i = j;
initials += " ";
}
else {
j = i + 1;
current = tweet.charAt(j);
while(j < tweet.length() && current >= 'A' && current <= 'Z') {
current = tweet.charAt(j);
j++;
}
i = j;
}
}
System.out.println(initials);
System.out.println("For a total of " + capital + " acronym(s)");
}//close Acronyms
}//TweetAnalyzer
除缩写部分外,一切正常。 这是我的输出:
Analyzing post:
OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag: 2
Name tag: 1
For a total of 0 acronym(s)
Analyzing post:
lost TV remote before superbowl #FML
Hash tag: 1
Name tag: 0
For a total of 0 acronym(s)
Analyzing post:
Think @bieber is soo hawt...#marryme
Hash tag: 1
Name tag: 1
For a total of 0 acronym(s)
Analyzing post:
seeing @linkinpark & @tswift in 2 weeks...OMG
Hash tag: 0
Name tag: 2
OMG
For a total of 1 acronym(s)
请帮我修复缩写部分。感谢
答案 0 :(得分:1)
像这样逐字逐句似乎更自然:
for (String word : tweet.split("\\s+")) {
if (word.charAt(0) == '@') {
names++;
} else if (word.charAt(0) == '#') {
hashtags++;
} else if (word.toUpperCase().equals(word)) {
abbrevs++;
}
}
答案 1 :(得分:0)
这就是我要做的事情:我将推文分成空格,这样你就有了一个单词列表。然后我扔出包含符号的单词。您可以使用StringUtils.isAlpha。现在,只需检查word.toUpperCase().equals(word)
。如果是,那就是没有符号的大写单词。你所谓的首字母缩略词。
答案 2 :(得分:0)
尝试使用此方法获取首字母缩略词:
private static int countAcronyms(String tweet) {
int acronyms = 0;
String[] words = tweet.split(" ");
for (String word : words) {
if(word.matches("[A-Z]+"))
acronyms++;
}
return acronyms;
}
答案 3 :(得分:0)
使用StringTokenizer
分割像这样的空白
StringTokenizer st = new StringTokenizer (yourString);
while(st.hasMoreTokens()) {
String str = st.nextElement();
if(str.toUpperCase().equals(str)) {
abbrvCount++;
}
}
希望这有帮助。