我想创建自己的高级随机名称生成器,而我能够找到的不仅将音节随机并置在一起的唯一资源是用Ruby编写的。
Ruby中的示例从一个充满名称的文本文件中生成两个字母对的概率表。它使用它来生成随机名称。我要在这里创建概率表。
问题是我从未学过Ruby,虽然我可以将它拼凑成一点,但是却碰壁了。
我对示例的最后几行感到困惑。从语句frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]]) }
开始。
在我看来,这行似乎是通过输入所有Ruby“频率” Hash值除以另一个Ruby Hash值的乘积来修改“最终”哈希的。第二个哈希称为“ letter_total_count”
问题在于作者在“ letter_total_count”上方评论说:“获取每个字母的总数”。由于哈希“频率”似乎仅包含两个字符的键值数组,因此它的键值数量与哈希“ letter_total_count”不同。这使得很难同时遍历两个对象。
我确定我的理解在这里某处是错误的,但希望您能明白为什么我对此感到困惑。
所提到的语句实际上有什么作用,如何在Java中实现?
在this blog page找到了原始的Ruby示例。
Ruby中的原始示例
# simple scripting to generate probability tables from a file
# PARAMETERS
# this script takes two parameters, the first is the input file
# and the second is the output file
require "yaml"
input_file = ARGV[0]
output_file = ARGV[1]
# treat all letters as well as spaces
chars = ('a'..'z').to_a.push(' ')
last_char_read = " "
frequencies = Hash.new(0.0)
# parse the file to read letter pair frequencies
File.open(input_file) do |file|
while char = file.getc
if ('a'..'z').to_a.include?(char.downcase)
if chars.include?(last_char_read.downcase)
frequencies[last_char_read.downcase + char.downcase] += 1
end
end
last_char_read = char
end
end
# get the total count of each single letter
letter_total_count = Hash.new(0.0)
frequencies.each {|key, value| letter_total_count[key[0]] += value}
letter_total_count[frequencies.keys.last[1]] += 1
# the final hash will contain our, ahem, final result
final = Hash.new(0.0)
frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]]) }
# make a running total
chars.each do |first_letter|
running_total = 0.0
('a'..'z').each do |second_letter|
if final.key? first_letter + second_letter
original_value = final[first_letter + second_letter]
final[first_letter + second_letter] += running_total
running_total += original_value
end
end
end
# output to file for later use
File.open(output_file, "w") {|file| file.puts YAML::dump(final)}
我尝试使用Java版本:
public class MCVE {
public static void main(String[] args) throws IOException {
String addr = "./res/text_files/";
File file = new File(addr + "human_name_samples");
String text = new Scanner(file).useDelimiter("\\Z").next();
text = text.replaceAll("[^a-zA-Z]", " ").toLowerCase();
char[] charArr = text.toCharArray();
// read letter pair frequencies
HashMap<char[], Integer> frequencies = new HashMap<>();
char lastCharRead = ' ';
for (int i = 0; i < charArr.length; i++) {
char[] tempArr = {lastCharRead, charArr[i]};
if (frequencies.containsKey(tempArr))
frequencies.replace(tempArr, frequencies.get(tempArr) + 1);
else
frequencies.put(tempArr, 1);
lastCharRead = charArr[i];
}
// get the total count of each single letter
HashMap<Character, Integer> letterTotalCount = new HashMap<>();
for (HashMap.Entry<char[], Integer> entry : frequencies.entrySet()) {
char[] tempArr = entry.getKey();
if (letterTotalCount.containsKey(tempArr[0]))
letterTotalCount.replace(tempArr[0], letterTotalCount.get(tempArr[0]) + 1);
else
letterTotalCount.put(tempArr[0], 1);
if (letterTotalCount.containsKey(tempArr[1]))
letterTotalCount.replace(tempArr[1], letterTotalCount.get(tempArr[1]) + 1);
else
letterTotalCount.put(tempArr[1], 1);
}
// holds final result
HashMap<char[], Double> finalMap = new HashMap<>();
for (HashMap.Entry<char[], Integer> entry : frequencies.entrySet()) {
char[] tempArr = entry.getKey();
// this is where I'm confused.
// trying to copy `frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]]) }`
double tempDouble = entry.getValue() / letterTotalCount.get(entry.getKey());
finalMap.put(tempArr, tempDouble);
}
}
}