在结构,正确性,简单性,可测试性(任务时间约1小时)方面,您如何评估以下任务的解决方案:
创建一个命令行Java程序,用于计算a中的唯一单词 文本文件并列出前10次出现。
英语语言环境并将连字符和撇号视为单词的一部分,输出应如下所示:
和(514)
(513)
我(446)
到(324)
a(310)
(295)
我的(288)
你(211)
那(188)
这(185)
解决方案:
WordCalculator.java(主类)
public class WordCalculator {
/**
* Counts unique words from a text file and lists the top 10 occurrences.
*
* @param args the command line arguments. First argument is the file path.
* If omitted, user will be prompted to specify path.
*
* @throws java.io.FileNotFoundException if the file for some other reason
* cannot be opened for reading.
*
* @throws java.io.IOException If an I/O error occurs
*/
public static void main(String[] args) throws FileNotFoundException, IOException {
File file;
List<String> listOfWords = new ArrayList<>();
// If a command argument is specified, use it as the file path.
// Otherwise prompt user for the path.
if (args.length > 0) {
file = new File(args[0]);
} else {
Scanner scanner = new Scanner(System.in);
System.out.print("Enter path to file: ");
file = new File(scanner.nextLine());
}
// Reads the file and splits the input into a list of words
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
listOfWords.addAll(WordUtil.getWordsFromString(line));
}
} catch (FileNotFoundException ex) {
Logger.getLogger(WordCalculator.class.getName()).log(Level.SEVERE,
String.format("Access denied reading from file '%s'.", file.getAbsolutePath()), ex);
throw ex;
} catch (IOException ex) {
Logger.getLogger(WordCalculator.class.getName()).log(Level.SEVERE,
"I/O error while reading input file.", ex);
throw ex;
}
// Retrieves the top ten frequent words and their frequencies.
Map<Object, Long> freqMap = FrequencyUtil.getItemFrequencies(listOfWords);
List<Map.Entry<?, Long>> topTenWords = FrequencyUtil.limitFrequency(freqMap, 10);
// Prints the top ten words and their frequencies.
topTenWords.forEach((word) -> {
System.out.printf("%s (%d)\r\n", word.getKey(), word.getValue());
});
}
}
FrequencyUtil.java
public class FrequencyUtil {
/**
* Transforms a list into a map with elements and their frequencies.
*
* @param list, the list to parse
* @return the item-frequency map.
*/
public static Map<Object, Long> getItemFrequencies(List<?> list) {
return list.stream()
.collect(Collectors.groupingBy(obj -> obj,Collectors.counting()));
}
/**
* Sorts a frequency map in descending order and limits the list.
*
* @param objFreq the map elements and their frequencies.
* @param limit the limit of the returning list
* @return a list with the top frequent words
*/
public static List<Map.Entry<?, Long>> limitFrequency(Map<?, Long> objFreq, int limit) {
return objFreq.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.limit(limit)
.collect(Collectors.toList());
}
}
WordUtil.java
public class WordUtil {
public static final Pattern ENGLISH_WORD_PATTERN = Pattern.compile("[A-Za-z'\\-]+");
/**
*
* @param s the string to parse into a list of words. Words not matching the
* english pattern(a-z A-z ' -) will be omitted.
*
* @return a list of the words
*
*/
public static List<String> getWordsFromString(String s) {
ArrayList<String> list = new ArrayList<>();
Matcher matcher = ENGLISH_WORD_PATTERN.matcher(s);
while (matcher.find()) {
list.add(matcher.group().toLowerCase());
}
return list;
}
}
答案 0 :(得分:2)
您的解决方案是正确的,但如果您正在寻找功能较少的编程解决方案和更多OOP。您应该避免使用带有静态方法的Utils类。而不是你可以使用你的WordCalculator添加实例方法和属性作为计数字的地图。此外,正则表达式模式对性能操作很重要,并且您正在执行循环(以功能方式)将此拆分的单词添加到地图中。其他选项是每个字节读取您的文件字节,当您找到非字母字符(文本文件很简单就足以检查空格)时,将字符串从StringBuilder转储到地图并向计数器添加1。如果文件是一个巨大的单行文本,您还可以避免可能出现的问题。
private void readWords(File file) {
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(file))) {
StringBuilder build = new StringBuilder();
int value;
while ((value = bufferedReader.read()) != -1) {
if(Character.isLetterOrDigit(value)){
build.append((char)Character.toLowerCase(value));
} else {
if(build.length()>0) {
addtoWordMap(build.toString());
build = new StringBuilder();
}
}
}
if(build.length()>0) {
addtoWordMap(build.toString());
}
} catch(FileNotFoundException e) {
//todo manage exception
e.printStackTrace();
} catch (IOException e) {
//todo manage exception
e.printStackTrace();
}
}