Question

我正在尝试编写一个程序，该程序使用CMU Pronouncing Dictionary（cmudict.txt）来计算包含英语单词的文本文件中的所有音节。什么是最好的方法？该程序应该通过计算单词，句子和字符的数量来分析一些提供的文本文件。我能够毫无问题地完成这些部件。现在，我试图找出如何使用cmudict.txt来计算音节数。我不知道从哪里开始。谢谢！

while ((line = reader.readLine()) != null) {
        if (line.equals("")) {
            numParagraph++;
        }
        if (!(line.equals(""))) {
            // Count number of Characters in file
            numChar += line.length();
            // Count number of words in file
            String[] wordList = line.split("\\s+");
            numWords += wordList.length;
            // Count number of sentences in a file
            for(int i = 0; i < line.length(); i++) {
                if(delimiters.indexOf(line.charAt(i)) != -1) {
                    sentenceCount++;
                }
            }
            //Average number of Characters per word
            wordListLength = wordList.length;
        }


    }

Answer 1

要使用CMUdict对音节进行计数，您只需要找到与您正在分析的单词相对应的CMUdict条目。输入后，您可以计算其中的元音声音。元音声音总是以数字结尾，可以是0、1或2。

有一个名为cmudict的模块，该模块自动导入CMUdict库并对其进行预处理。您应该从这里开始。

import cmudict

def lookup_word(word_s):
    return cmudict.dict().get(word_s)

def count_syllables(word_s):
    count = 0
    phones = lookup_word(word_s) # this returns a list of matching phonetic rep's
    if phones:                   # if the list isn't empty (the word was found)
        phones0 = phones[0]      #     process the first
        count = len([p for p in phones0 if p[-1].isdigit()]) # count the vowels
    return count

word_s = 'hello'
phones = cmudict.dict().get(word_s)
count = count_syllables(word_s)
print(f"PHONES({word_s!r}) yields {phones}\nCOUNT is {count}")

使用CMUdict可以比上述做更多的事情，但这只是一个开始... 编码愉快！

PS-重新阅读您的信息，我发现我错过了部分信息。您可以使用正则表达式（regex）将所有输入文本按空格和标点符号进行拆分。那应该给你所有的话。然后，您可以通过上述例程一次运行每个单词，以计算该单词的音节数，并在进行过程中对音节计数求和。您还可以使用上面的命令来验证以前的字数。但是您当前的单词计数程序可能仅检测到每个单词的开头，而不是结尾，除非您也将其扩展为检测单词的结尾，否则可能无法帮助您。如果这样做，则可以忽略正则表达式进行拆分。

使用cmudict来计算音节

1 个答案: