将用于字符的Ruby概率表生成器转换为Java

时间:2018-12-24 18:20:50

标签: java ruby probability code-translation procedural-generation

我想创建自己的高级随机名称生成器,而我能够找到的不仅将音节随机并置在一起的唯一资源是用Ruby编写的。

Ruby中的示例从一个充满名称的文本文件中生成两个字母对的概率表。它使用它来生成随机名称。我要在这里创建概率表。

问题是我从未学过Ruby,虽然我可以将它拼凑成一点,但是却碰壁了。

我对示例的最后几行感到困惑。从语句frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]]) }开始。

在我看来,这行似乎是通过输入所有Ruby“频率” Hash值除以另一个Ruby Hash值的乘积来修改“最终”哈希的。第二个哈希称为“ letter_total_count”

问题在于作者在“ letter_total_count”上方评论说:“获取每个字母的总数”。由于哈希“频率”似乎仅包含两个字符的键值数组,因此它的键值数量与哈希“ letter_total_count”不同。这使得很难同时遍历两个对象。

我确定我的理解在这里某处是错误的,但希望您能明白为什么我对此感到困惑。

所提到的语句实际上有什么作用,如何在Java中实现?

this blog page找到了原始的Ruby示例。

Ruby中的原始示例

# simple scripting to generate probability tables from a file

# PARAMETERS
# this script takes two parameters, the first is the input file 
# and the second is the output file

require "yaml"

input_file = ARGV[0]
output_file = ARGV[1]

# treat all letters as well as spaces
chars = ('a'..'z').to_a.push(' ')

last_char_read = " "
frequencies = Hash.new(0.0)

# parse the file to read letter pair frequencies
File.open(input_file) do |file|
    while char = file.getc
        if ('a'..'z').to_a.include?(char.downcase)
            if chars.include?(last_char_read.downcase)
                frequencies[last_char_read.downcase + char.downcase] += 1
            end
        end

        last_char_read = char
    end
end

# get the total count of each single letter
letter_total_count = Hash.new(0.0)
frequencies.each {|key, value| letter_total_count[key[0]] += value}  
letter_total_count[frequencies.keys.last[1]] += 1

# the final hash will contain our, ahem, final result
final = Hash.new(0.0)
frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]])  }  

# make a running total 
chars.each do |first_letter|
    running_total = 0.0

    ('a'..'z').each do |second_letter| 
        if final.key? first_letter + second_letter
            original_value = final[first_letter + second_letter] 
            final[first_letter + second_letter] += running_total
            running_total += original_value
        end
    end
end

# output to file for later use
File.open(output_file, "w") {|file| file.puts YAML::dump(final)}

我尝试使用Java版本:

public class MCVE {

public static void main(String[] args) throws IOException {
    String addr = "./res/text_files/";

    File file = new File(addr + "human_name_samples");
    String text = new Scanner(file).useDelimiter("\\Z").next();
    text = text.replaceAll("[^a-zA-Z]", " ").toLowerCase();
    char[] charArr = text.toCharArray();

    // read letter pair frequencies
    HashMap<char[], Integer> frequencies = new HashMap<>();
    char lastCharRead = ' ';
    for (int i = 0; i < charArr.length; i++) {
        char[] tempArr = {lastCharRead, charArr[i]};

        if (frequencies.containsKey(tempArr))
            frequencies.replace(tempArr, frequencies.get(tempArr) + 1);
        else
            frequencies.put(tempArr, 1);

        lastCharRead = charArr[i];
    }

    // get the total count of each single letter
    HashMap<Character, Integer> letterTotalCount = new HashMap<>();
    for (HashMap.Entry<char[], Integer> entry : frequencies.entrySet()) {
        char[] tempArr = entry.getKey();

        if (letterTotalCount.containsKey(tempArr[0]))
            letterTotalCount.replace(tempArr[0], letterTotalCount.get(tempArr[0]) + 1);
        else
            letterTotalCount.put(tempArr[0], 1);

        if (letterTotalCount.containsKey(tempArr[1]))
            letterTotalCount.replace(tempArr[1], letterTotalCount.get(tempArr[1]) + 1);
        else
            letterTotalCount.put(tempArr[1], 1);
    }

    // holds final result
    HashMap<char[], Double> finalMap = new HashMap<>();
    for (HashMap.Entry<char[], Integer> entry : frequencies.entrySet()) {
        char[] tempArr = entry.getKey();
        // this is where I'm confused.
        // trying to copy `frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]])  }`
        double tempDouble = entry.getValue() / letterTotalCount.get(entry.getKey());
        finalMap.put(tempArr, tempDouble);
    }
}
}

0 个答案:

没有答案