Question

我试图计算传递到数组中的字符串的每个单词中的所有音节。

一个音节被算作彼此相邻的两个元音（a，e，i，o，u，y）。例如，“ peel”中的“ ee”算作1个音节。但是，“ juked”中的“ u”和“ e”计为2个音节。单词末尾的“ e”不算作音节。另外，即使以前的规则不适用，每个单词也至少有一个音节。

我有一个文件，该文件包含传递给数组的大字符串（单词，空格和换行符）。我有通过每个单词和换行符之间的空格来计数每个单词的代码。见下文：

for (i = 0; i < lengthOfFile; i++)
{
    if (charArray[i] == ' ' || charArray[i] == '\n')
    {
      wordCount++;
    }
  }

其中charArray是传递到数组(freads)的文件，而lengthOfFile是文件中被(fseek)计数的总字节，而wordCount是总字数。

从这里开始，我需要以某种方式计算数组中每个单词的音节，但不知道从哪里开始。

Answer 1

如果您仍然遇到问题，那仅仅是因为您对问题的思考过多。每当进行计数，确定频率等时，通常都可以使用“状态环”来简化操作。状态循环只不过是一个循环，您可以循环遍历每个字符（或其他任何字符）并处理遇到的任何状态，例如：

我读过任何字符吗？（如果没有，请处理该状态）；
当前字符是空格吗？（如果是这样，为简单起见，假设没有多个空格，那么您已经到达单词的结尾，请处理该状态）；
当前字符是非空格和非元音吗？（如果是这样，如果我的最后一个字符是元音，请增加我的音节数）；和
无论当前char的分类如何，我需要做什么？（输出它，设置last = current等）。

基本上就是这样，可以将其转换为具有多个测试以处理每种状态的单个循环。您还可以通过检查到达单词结尾的音节数是否为零，来确保将"my"和"he"之类的单词计为单个音节。

将其放到一起，您可以编写一个基本的实现，例如：

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main (void) {

    char c, last = 0;                       /* current & last char */
    const char *vowels = "AEIOUYaeiouy";    /* vowels (plus Yy) */
    size_t syllcnt = 0, totalcnt = 0;       /* word syllable cnt & total */

    while ((c = getchar()) != EOF) {        /* read each character */
        if (!last) {                        /* if 1st char (no last) */
            putchar (c);                    /* just output it */
            last = c;                       /* set last */
            continue;                       /* go get next */
        }

        if (isspace (c)) {                  /* if space, end of word */
            if (!syllcnt)                   /* if no syll, it's 1 (he, my) */
                syllcnt = 1;
            printf (" - %zu\n", syllcnt);   /* output syll cnt and '\n' */
            totalcnt += syllcnt;            /* add to total */
            syllcnt = 0;                    /* reset syllcnt to zero */
        }   /* otherwise */
        else if (!strchr (vowels, c))       /* if not vowel */
            if (strchr (vowels, last))      /* and last was vowel */
                syllcnt++;                  /* increment syllcnt */

        if (!isspace (c))                   /* if not space */
            putchar (c);                    /* output it */
        last = c;                           /* set last = c */
    }
    printf ("\n  total syllables: %zu\n", totalcnt);
}

（注意：，如上所述，这个简单的示例实现不考虑单词之间的多个空格-您可以通过检查是否!isspace (last)来简单地添加为另一个所需条件。弄清楚应该在哪里添加该支票，提示：它已通过&&添加到现有支票中-尚需进行微调）

使用/输出示例

$ echo "my dog eats banannas he peels while getting juked" | ./syllablecnt
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2

  total syllables: 13

如果您需要从文件中读取单词，只需将文件作为输入重定向到stdin上的程序，例如

./syllablecnt < inputfile

编辑-从文件读取到动态分配的缓冲区

根据关于要从文件（或stdin）中读取内容到动态大小的缓冲区中的注释，然后遍历该缓冲区以输出每个单词的音节和总音节的注释，您可以执行类似接下来的操作简单地将所有字符从文件中读取到最初分配的包含8个字符的缓冲区中，并根据需要进行重新分配（每次需要realloc时分配大小都会加倍）。这是一个相当标准且合理有效的缓冲区增长策略。您可以随意增大它的大小，但是要避免许多小的兔子球重新分配，因为从计算角度来看，内存分配相对昂贵。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define NCHAR 8     /* initial characters to allocate */

int main (int argc, char **argv) {

    char c, last = 0, *buffer;              /* current, last & pointer */
    const char *vowels = "AEIOUYaeiouy";    /* vowels */
    size_t syllcnt = 0, totalcnt = 0,       /* word syllable cnt & total */
            n = 0, size = NCHAR;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("fopen-file");
        return 1;
    }

    /* allocate/validate initial NCHAR buffer size */
    if (!(buffer = malloc (size))) {
        perror ("malloc-buffer");
        return 1;
    }

    while ((c = fgetc(fp)) != EOF) {        /* read each character */
        buffer[n++] = c;                    /* store, increment count */
        if (n == size) {                    /* reallocate as required */
            void *tmp = realloc (buffer, 2 * size);
            if (!tmp) {                     /* validate realloc */
                perror ("realloc-tmp");
                break;      /* still n good chars in buffer */
            }
            buffer = tmp;   /* assign reallocated block to buffer */
            size *= 2;      /* update allocated size */
        }
    }
    if (fp != stdin)        /* close file if not stdin */
        fclose (fp);

    for (size_t i = 0; i < n; i++) {        /* loop over all characters */
        c = buffer[i];                      /* set to c to reuse code */
        if (!last) {                        /* if 1st char (no last) */
            putchar (c);                    /* just output it */
            last = c;                       /* set last */
            continue;                       /* go get next */
        }

        if (isspace(c) && !isspace(last)) { /* if space, end of word */
            if (!syllcnt)                   /* if no syll, it's 1 (he, my) */
                syllcnt = 1;
            printf (" - %zu\n", syllcnt);   /* output syll cnt and '\n' */
            totalcnt += syllcnt;            /* add to total */
            syllcnt = 0;                    /* reset syllcnt to zero */
        }   /* otherwise */
        else if (!strchr (vowels, c))       /* if not vowel */
            if (strchr (vowels, last))      /* and last was vowel */
                syllcnt++;                  /* increment syllcnt */

        if (!isspace (c))                   /* if not space */
            putchar (c);                    /* output it */
        last = c;                           /* set last = c */
    }
    free (buffer);      /* don't forget to free what you allocate */

    printf ("\n  total syllables: %zu\n", totalcnt);
}

（您可以使用fgets或POSIX getline进行同样的操作，也可以一次用fseek/ftell或stat然后fread分配全部一次调用将整个文件放入缓冲区-由您决定）

内存使用/错误检查

在您编写的任何动态分配内存的代码中，对于任何分配的内存块，您都有2个职责：（1）始终保留指向起始地址的指针因此，（2）当不再需要它时可以释放。

当务之急是使用一个内存错误检查程序来确保您不会尝试访问内存或在已分配的块的边界之外/之外进行写入，不要试图以未初始化的值读取或基于条件跳转，最后，以确认您释放了已分配的所有内存。

对于Linux，valgrind是正常选择。每个平台都有类似的内存检查器。它们都很容易使用，只需通过它运行程序即可。

$ valgrind ./bin/syllablecnt_array dat/syllables.txt
==19517== Memcheck, a memory error detector
==19517== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==19517== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==19517== Command: ./bin/syllablecnt_array dat/syllables.txt
==19517==
my - 1
dog - 1
eats - 1
banannas - 3
he - 1
peels - 1
while - 1
getting - 2
juked - 2

  total syllables: 13
==19517==
==19517== HEAP SUMMARY:
==19517==     in use at exit: 0 bytes in 0 blocks
==19517==   total heap usage: 5 allocs, 5 frees, 672 bytes allocated
==19517==
==19517== All heap blocks were freed -- no leaks are possible
==19517==
==19517== For counts of detected and suppressed errors, rerun with: -v
==19517== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认已释放已分配的所有内存，并且没有内存错误。

仔细检查一下，如果还有其他问题，请告诉我。

计数数组中的音节

1 个答案: