在字符串和符号之间找到重复的单词

时间:2013-10-29 15:57:26

标签: c string

我需要在一个句子中找到2个相似的单词,我需要找到这两个单词之间有多少符号。我无法弄清楚alghoritm。

例如,句子是 -

  

这是漂亮的桌子和漂亮的椅子。

类似的词是 - nice

它们之间的符号是11或8(不知道空格是否算作符号)

int main()    
char text[200];
printf("Enter one sentence \n ");
gets(text);

也许首先是

dist=strtok(text, " ,.!?");
while(dist!=0)
{
 printf("%s\n", dist);
 dist=strtok(NULL, " ,.!?");
 }

它将输出每个单词,然后可能会搜索相似的单词,如果有2个字符串然后我会使用strstr但我不知道如何处理一个字符串。

4 个答案:

答案 0 :(得分:2)

  1. 使用touppertolower使整个字符串大写/小写。这将使后续比较更容易。

  2. 使用strtok,创建一个char*值数组,指向句子中每个单词的开头。如果您愿意,可以在此阶段消除空字符串和标点符号。

  3. 使用嵌套循环使用strcmp函数比较此数组中的每对单词。

  4. 当两个单词匹配时,您可以使用指针算法和strlen函数计算它们之间的距离。

答案 1 :(得分:2)

这是我的建议:

+创建一个字符串数组,例如words[50][20],或者如果您熟悉它,可以使用动态内存分配。

+将文本[]的每个字符复制到单词[0]中,直到它到达空白区域。跳到单词[1],依此类推,直到文本[]结束。

+现在您需要做的就是strcmp所有单词来查找相同的字符串。在上面的示例中,您应该获得strcmp(words[2],words[5])=0

+要查找它们之间有多少个符号,只需总结它们之间所有单词的长度,例如示例中的strlen(words[3])+strlen(words[4])。如果您想考虑空白区域,请为每个单词的总和加1:

这是算法。您应该自己实现代码

答案 2 :(得分:1)

算法将是这样的:

  1. 使用strtok(或者如果你喜欢手动解析,strtok是线程不安全的),请获取句子中的所有单词。
  2. 在每个步骤中,将单词插入到地图(哈希表)中,其中键是单词本身,值是在句子中开始的位置,值存储在数组中。
  3. 完成解析输入后,最终会得到一张地图,其中的关键字为单词,值为原始句子中起始位置的排序数组。
  4. 现在,要获得字符数,你要做的就是减去对应于一个单词的数组中的任何2个连续位置,因为这个数组有超过2个元素,因此单词出现在句子中一旦。示例:假设在pos 10和pos 20上出现“nice”,你将在map =>中的位置数组中得到这2个你可以计算距离

答案 3 :(得分:0)

这取决于你将要使用的结果。它是用于修改字符串,打印统计数据,复制单词等。它是否必须是线程安全的,它是否支持常量字符串等。效率和简单性最重要的是什么。

一种方法是完全依赖指针。

  1. 指向第一个单词ptr1
  2. 的指针
  3. 指向下一个单词ptr2的指针。
    1. 根据字ptr1检查字ptr2
    2. ptr2推进到下一个单词。
    3. 如果未找到匹配,请转到1。
  4. 如果未找到匹配项,请将ptr2提前到下一个单词。
  5. 转到2。
  6. 用于比较单词的裸骨架可能是这样的:

    int compare_word(char *word1, char *word2)
    {
            int i;
    
            for (i = 0;
                    word1[i] && word2[i] &&
                    word1[i] == word2[i] &&
                    isalpha(word1[i]); ++i)
                    ;
    
            return i && !isalpha(word1[i]) && !isalpha(word2[i]);
    }
    

    在计算单词之间的距离时要记住的一件事是要注意多字节字符串格式,例如utf-8。一个字母可以是几个字节:

    a æøå a
     ^---^ 8 bytes, 5 characters.
    

    您可以使用mbstowcs来获取多字节序列的长度,但是您还必须注意区域设置。一个典型的场景:

    char *test = "æøå";
    
    printf("%s: %u\n", test, mbstowcs(NULL, test, 0));
    printf("%s: %u\n", test, strlen(test));
    
    setlocale(LC_ALL, "");
    
    puts("-----------------------------------------------");
    printf("%s: %u\n", test, mbstowcs(NULL, test, 0));
    printf("%s: %u\n", test, strlen(test));
    

    结果:

    æøå: 4294967295
    æøå: 6
    -----------------------------------------------
    æøå: 3
    æøå: 6
    

    总之。作为概念的样本,这里是一些代码行。请注意,这个不是字节安全的 - (只有ASCII才能提供合理的结果)。点画线末尾的数字是“距离:单词1的开头到单词2的开头”“单词1的距离结束到单词2的开始”“字宽”。示例输出:

    $ ./wordword 
    Enter one sentence:
    Lizzie Borden took an axe And gave her mother forty whacks When she saw what she had done She gave her father forty-one.
    
    Lizzie Borden took an axe And gave her mother forty whacks When she saw what she had done She gave her father forty-one.
                                  ^---------------------------------------------------------------^ (64, 60, 4)
    MATCH: 'gave' 60 bytes of separation.  (Press enter for next.)
    
    Lizzie Borden took an axe And ____ her mother forty whacks When she saw what she had done She ____ her father forty-one.
                                       ^---------------------------------------------------------------^ (64, 61, 3)
    MATCH: 'her' 61 bytes of separation.  (Press enter for next.)
    
    Lizzie Borden took an axe And ____ ___ mother forty whacks When she saw what she had done She ____ ___ father forty-one.
                                                  ^---------------------------------------------------------------^ (64, 59, 5)
    MATCH: 'forty' 59 bytes of separation.  (Press enter for next.)
    
    Lizzie Borden took an axe And ____ ___ mother _____ whacks When she saw what she had done She ____ ___ father _____-one.
                                                                    ^------------^ (13, 10, 3)
    MATCH: 'she' 10 bytes of separation.  (Press enter for next.)
    

    示例代码。 (好吧,在这里完整性有点过分了。开始时有30行并且增长了很多。但是,它仍然存在很多缺点,仅作为使用指针的一个例子等):

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define FMT_BBLACK     "\033[1;30m"                /* Bold color black */
    #define FMT_BRED       "\033[1;31m"                /* Bold color red */
    #define FMT_BGREEN     "\033[1;32m"                /* Bold color green */
    #define FMT_BYELLOW    "\033[1;33m"                /* Bold color yellow */
    #define FMT_BBLUE      "\033[1;34m"                /* Bold color blue */
    #define FMT_BMAGENTA   "\033[1;35m"                /* Bold color magenta */
    #define FMT_BCYAN      "\033[1;36m"                /* Bold color cyan */
    #define FMT_BWHITE     "\033[1;37m"                /* Bold color white */
    #define FMT_NONE       "\033[0m"                   /* Reset */
    #define FMT_MATCH      FMT_BRED
    
    #define DEL_NONE       0x00                        /* Keep words. (Causes re-match) */
    #define DEL_WORD1      0x01                        /* Remove first word-match */
    #define DEL_WORD2      0x02                        /* Remove second word-match */
    #define DEL_BOTH       (DEL_WORD1 | DEL_WORD2)
    
    /* Print graph */
    int debug = 1;
    
    /* ********************************************************************** *
     *        Helper functions
     * ********************************************************************** */
    /* Return pointer to next alpha,
     * or null on end of string) */
    char *skip_noword(char *p)
    {
            while (*p && !isalpha(*p))
                    ++p;
            return p;
    }
    
    /* Return pointer to byte after last alpha,
     * or null on end of C-string */
    char *eof_word(char *p)
    {
            while (*p && isalpha(*p))
                    ++p;
            return p;
    }
    
    /* Return pointer to first letter of next word,
     * or null on end of C-string. */
    char *next_word(char *p)
    {
            p = eof_word(p);
            return skip_noword(p);
    }
    
    /* Compare whole word starting at p1 with word starting at p2.
     * Return 1 on match, else 0.
     * */
    int compare_word(char *p1, char *p2)
    {
            int i;
    
            for (i = 0;
                    p1[i] && p2[i] &&
                    isalpha(p1[i]) &&
                    p1[i] == p2[i]; ++i)
                    ;
    
            return i && !isalpha(p1[i]) && !isalpha(p2[i]);
    }
    
    /* ********************************************************************** *
     *        Search routine
     * ********************************************************************** */
    /* Find next word with a matching entry.
     * Return pointer to first word.
     * Set match to matching entry.
     * */
    char *word_word(char *buf, char **match)
    {
            char *p;
            *match = NULL;
    
            buf = skip_noword(buf);
    
            /* Outer loop.
             * Advance one and one word. */
            while (*buf) {
                    /* Inner loop.
                     * Compare current buf word with rest of words after it. */
                    p = next_word(buf);
                    while (*p) {
                            if (compare_word(buf, p)) {
                                    *match = p;
                                    return buf;
                            }
                            p = next_word(p);
                    }
                    buf = next_word(buf);
            }
    
            return (char*)NULL;
    }
    
    /* ********************************************************************** *
     *        Clear, Copy, Print etc.
     * ********************************************************************** */
    
    /* Bytes between end of one word to beginning of next.
     * */
    size_t words_dist(char *w1, char *w2)
    {
            return w2 - eof_word(w1);
    }
    
    /* Replace all alpha characters with given char.
     * */
    void clear_word(char *p, char r)
    {
            while (*p && isalpha(*p))
                    *p++ = r;
    }
    
    /* Return a copy of word pointed to by p.
     * */
    void *word_cpy(char *p)
    {
            void *buf;
            char *start = p;
            size_t n;
    
            n = eof_word(p) - start + 1;
            if (!(buf = malloc(n)))
                    return (void*)NULL;
            memcpy(buf, start, n);
            ((char*)buf)[n - 1] = 0x00;
    
            return buf;
    }
    
    /* Print graph showing position of p2 and p3 in p1.
     * */
    void explain(char *p1, char *p2, char *p3)
    {
            size_t n1 = p3 - p2;
            size_t n2 = words_dist(p2, p3);
    
            puts(p1);
            while (p1++ != p2)
                    putchar(' ');
            putchar('^');
            while (++p2 != p3)
                    putchar('-');
            printf("^ (%d, %d, %d)\n", n1, n2, n1 - n2);
    }
    
    /* Print C-string using color.
     *
     * */
    void print_word(FILE *out, char *word)
    {
            fprintf(out, "%s%s%s", FMT_MATCH, word, FMT_NONE);
    }
    
    /* Print single word pointed to by p in (longer) C-string.
     * Use dynamic buffer.
     * */
    void print_word_safe(FILE *out, char *p)
    {
            char *word;
    
            if (!(word = word_cpy(p)))
                    return;
            print_word(out, word);
            free(word);
    }
    
    /* Print single word pointed to by p in (longer) C-string.
     * Modify and reset source.
     * */
    void print_word_mod(FILE *out, char *p)
    {
            char *start = p;
            char csave;
    
            p = eof_word(p);
            csave = *p;
            *p = 0x00;
            print_word(out, start);
            *p = csave;
    }
    
    /* ********************************************************************** *
     *        Main
     * ********************************************************************** */
    int main(int argc, char *argv[])
    {
            char buf_scan[4096];    /* Buffer holding typed input. */
            char *buf_start;        /* Start of buffer. */
            char *buf_pos;          /* Current position in buffer. */
            char *match;            /* Position for matched word. */
            int delete;             /* Delete flag mask. */
    
            debug = 1;              /* 1=Print explanation. */
            delete = DEL_BOTH;      /* DEL_[NONE, WORD1, WORD2, BOTH] */
    
            if (argc > 1) {
                    /* Use first argument instead of user input. */
                    buf_start = argv[1];
            } else {
                    /* Get user input. */
                    buf_start = buf_scan;
                    fputs("Enter one sentence:\n", stderr);
                    if (!fgets(buf_scan, sizeof(buf_scan) - 1, stdin))
                            buf_scan[0] = 0x00;
                    buf_scan[strlen(buf_scan) - 1] = 0x00;
                    putc('\n', stderr);
            }
    
            buf_pos = buf_start;
            /* Get next matching pair. */
            while ((buf_pos = word_word(buf_pos, &match))) {
                    if (debug)
                            explain(buf_start, buf_pos, match);
                    /* Report findings */
                    fputs("MATCH: ", stderr);
                    print_word_mod(stderr, buf_pos);
                    fprintf(stderr,
                            " %d bytes of separation.",
                            words_dist(buf_pos, match)
                    );
                    /* Clear out matched word pair. */
                    if (delete & DEL_WORD1) {
                            clear_word(buf_pos, '_');
                    }
                    if (delete & DEL_WORD2) {
                            clear_word(match, '_');
                    }
                    /* Advance head to next word. */
                    buf_pos = next_word(buf_pos);
                    fputs("  (Press enter for next.)", stderr);
                    getchar();
                    putc('\n', stderr);
            }
    
            if (0 && debug)
                    printf("FINE:\n%s\n\n", buf_start);
    
            return 0;
    }