我需要一些关于此代码的帮助。我希望我的程序计算从描述的模式匹配的每个单词的频率。
public class Project {
public static void main(String[] args) throws FileNotFoundException{
Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" ");
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
int occurences=0;
while(INPUT_TEXT.hasNext()){
//read next word
String Stringcandidate=INPUT_TEXT.next();
//see if pattern matches (boolean find)
if(r.matcher(Stringcandidate).find()) {
occurences++; //increment occurences if pattern is found
String moviereview = m.group(0); //retrieve found string
String moviereview2 = moviereview.toLowerCase(); // ???
System.out.println(moviereview2 + " appears " + occurences);
if(occurences>1){
System.out.println(" times\n");
}
else{
System.out.println(" time\n");
}
}
INPUT_TEXT.close();//Close your Scanner.
}
}
答案 0 :(得分:1)
正如我之前的评论中所述,可以使用Map(HashMap)来存储匹配的单词及其出现/频率。
我建议将程序的功能封装到较小的方法/类中,以便每个方法/类只执行一项小任务。因此可以更好地阅读代码。
我假设你的文件中包含了字符串"自动丛林在矮牵牛车中胜过她的番茄"
以下是代码:
package how_to_calculate_the_frequency;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Project {
HashMap<String, Integer> map = new HashMap<String, Integer>();
public static void main(String[] args){
Project project = new Project();
Scanner INPUT_TEXT = project.readFile();
project.analyse(INPUT_TEXT);
project.showResults();
}
/**
* logic to count the occurences of words matched by REGEX in a scanner that
* loaded some text
*
* @param scanner
* the scanner holding the text
*/
public void analyse(Scanner scanner) {
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
while (scanner.hasNext()) {
// read next word
String Stringcandidate = scanner.next();
// see if pattern matches (boolean find)
Matcher matcher = r.matcher(Stringcandidate);
if (matcher.find()) {
String matchedWord = matcher.group();
//System.out.println(matchedWord); //check what is matched
this.addWord(matchedWord);
}
}
scanner.close();// Close your Scanner.
}
/**
* adds a word to the <word,count> Map if the word is new, a new entry is
* created, otherwise the count of this word is incremented
*/
public void addWord(String matchedWord) {
if (map.containsKey(matchedWord)) {
// increment occurrence
int occurrence = map.get(matchedWord);
occurrence++;
map.put(matchedWord, occurrence);
} else {
// add word and set occurrence to 1
map.put(matchedWord, 1);
}
}
/**
* reads a file from disk and returns a scanner to analyse it
*
* @return the file from disk as scanner
*/
public Scanner readFile() {
Scanner scanner = null;
/* use that for reading a file from disk
* try { scanner = new Scanner(new
* File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) {
* e.printStackTrace(); }
*/
scanner = new Scanner("auto bush trumped her tomato in the petunia auto");
return scanner;
}
/**
* prints the matched words and their occurrences
* in a readable way
*/
public void showResults() {
for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) {
int occurrence = matchedWord.getValue();
System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence);
if (occurrence > 1) {
System.out.print(" times\n");
} else {
System.out.print(" time\n");
}
}
// or as the new Java 8 lambda expression
// map.forEach((word,occurrence)->System.out.println("\"" + word + "\"
// appears " + occurrence + " times"));
}
}
// DONE seperate reading a file, analysing the file and
// word-frequency-counting-logic in different
// methods
// Done implement <word,count> Map and logic to add new and known(to the map)
// words
这会产生:
&#34;所述&#34;出现1次
&#34;自动&#34;出现2次
&#34;她的&#34;出现1次
&#34;在&#34;出现1次
&#34;衬套&#34;出现1次
&#34;捏造&#34;出现1次
&#34;番茄&#34;出现1次
&#34;矮牵牛&#34;出现1次
问候