从C中的句子中删除重复的单词

时间:2013-08-14 05:08:37

标签: c string algorithm char

我需要编写一个函数,它将从字符串中删除所有重复的子字符串,下面的函数不是作业,而是不正确。

输入:这是第2课Quit lesson2的简单测试

输出:第2课的简单测试退出

正如您所看到的,从句子中删除“ ”功能但不正确。

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        if (strstr(*output, element) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

更新版本的代码(@Rohan)

输入:这是一个简单的测试,对于第2课退出

输出:这是一个简单的测试,对于第2课退出

void RemoveDuplicates(char text[], size_t text_size, char** output)
{
    char *temp = NULL;
    char *element;
    /* Allocate size for output. */
    *output = (char*) malloc(text_size);
    *output[0] = '\0';

    /* Split string into tokens */
    element = strtok(text, " ");
    if (element != NULL)
        strcpy(*output, element);

    while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        temp = strstr(*output, element);
        /* check for space before/after it or '\0' after it. */
        if (temp == NULL || temp[-1] == ' ' || temp[strlen(element)] == ' ' || temp[strlen(element)] == '\0'  ) {

            strcat(*output, " " );
            strcat(*output, element );
        }
    }
}

3 个答案:

答案 0 :(得分:4)

您需要检查element中的单词而不是普通字符串。

你得到的是,在你的输入字符串中有2 "is"一个是"This"的一部分,而另一个是实际的单词"is"

 This is a simple test for lesson2 Quit lesson2
 --^ -^  

strstr()找到两个字符串,并删除第二个"is"。但是你只需找到重复的单词。

您可以通过检查找到的字词之前和之后的空格' '来执行此操作。如果最后一个字在末尾检查'\0'

尝试将while循环更新为:

char temp[512] = { 0 }; //use sufficient array
while( (element = strtok(NULL, " ")) != NULL ) {
        /* Is the element already in the result string? */
        //create word
        sprintf(temp, " %s ", element);
        if(strstr(*output, temp) == NULL) {
            strcat(*output, " " );
            strcat(*output, element );
        }
    }

答案 1 :(得分:0)

免责声明: 这不会修复您的算法,请参阅算法修复程序的@Rohans答案。


要修复代码,请执行以下操作:

*output = (char*) malloc(text_size);

......应该是:

char *output = malloc(text_size);

......并改变:

*output[0] = '\0';

......将成为:

output[ 0 ] = '\0';

...不要施放malloced内存块。您可以阅读有关此here的更多信息。请注意output[ 0 ]暗示*( output + 0 )

接下来,改变:

strcpy(*output, element);

......来:

strcpy(output, element);

......然后改变:

if (strstr(*output, element) == NULL) {
  strcat(*output, " " );
  strcat(*output, element );
}

......来:

if (strstr(output, element) == NULL) {
  strcat(output, " " );
  strcat(output, element );
}

...注意output已经是一个指针,使用*,因为你取消引用返回一个字符的指针。 strstrstrcpy要求dest是指向一个字符数组的指针。

答案 2 :(得分:0)

你可以尝试这样的事情

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char test_text[] = "This is a is is simple test simple for lesson2 Quit";

int main(int argc, char* argv[])
{
  const int maxTokens = 200; // lets assume there is max 200 tokens in a sentence
  char* array[maxTokens];    // pointers to strings
  int unique_tokens = 0;     // number of unique tokens found in string

  // first tokenize the string and put it into a structure that is a bit more flexible
  char* element = strtok(test_text," ");
  for (; element != NULL; element = strtok(NULL, " "))
  {
     int foundToken = 0;
     int i;
     // do we have it from before?
     for (i = 0; i < unique_tokens && !foundToken; ++i)
     {
       if ( !strcmp(element, array[i]) )
       {
         foundToken = 1;
       }
     }

     // new token, add
     if ( !foundToken )
     {
       array[unique_tokens++] = (char*)strdup(element); // this allocates space for the element and copies it
     }
  }

  // now recreate the result without the duplicates.

  char result[256] = {0};
  int i;
  for (i = 0; i < unique_tokens; ++i)
  {
    strcat(result,array[i]);
    if ( i < unique_tokens - 1 )
    {
      strcat(result," ");
    }
  }

  puts( result );

  return 0;
}