拆分字符串并返回字符串数组

时间:2019-01-18 20:49:44

标签: c

我希望split_str能够接受"bob is great"并返回["bob", "is", "great"]

更准确地说:foo = split_str("bob is great", " ")["bob", "is", "great"]中分配foo(因此变成由3个字符串组成的数组,所有这些字符串均按指定的空格隔开...但是我希望这样做不仅可以生成3个字符串的数组,而且还可以生成任意数量的字符串)。

char* split_str(char*, char[]);

char* split_str(char* str, char delim[]) {
    char copied_input[strlen(str)];
    strncpy (copied_input, str, strlen(str)+1);

    char* result[strlen(str)+1];  // add 1 for the "NULL" char

    int tmp = 0;  // preparing iterator
    result[tmp] = strtok (copied_input, delim);  // obtaining first word

    while (result[tmp] != NULL) {  // to populate the whole array with each words separately
        result[++tmp] = strtok (NULL, delim);
    }

    return result;
}

这或多或少代表了我正在尝试实现的执行方式:

int main (void)
{
    int MAX_AMNT = 50;  // maximum amount of args to parse
    char *bar[MAX_AMNT];
    bar = split_str("bob is great", " ");
    tmp = 0;
    while (bar[tmp] != NULL) {
        fprintf (stdout, "Repeating, from array index %d: %s\n", tmp, bar[tmp++]);
    }
}

我对C刚起步,所以我对问题的措辞方式可能是错误的(指针和数组,以及数组的指针等等,这对我来说还是有点头疼)。

我知道我的return签名对我的函数是错误的,并且返回本地变量(result)可能是错误的,但是我不知道如何从此处继续。我尝试将其更改为void函数,并添加第三个参数作为要填充的变量(如result一样),但是我一直遇到错误。

2 个答案:

答案 0 :(得分:4)

解决方案是:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if (strtok(s, delim) == 0)
    /* no word */
    return NULL;

  int nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));
  int i;

  v[0] = strdup(strtok(s, delim));

  for (i = 1; i != nw; ++i)
    v[i] = strdup(strtok(NULL, delim));

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  char ** v = split("bob is  great", " ");

  for (int i = 0; v[i] != NULL; ++i) {
    puts(v[i]);
    free(v[i]);
  }

  free(v);
  return 0;
}

如您所见,我在向量的末尾添加了一个空指针作为标记,但是可以很容易地更改它以返回单词数等。

执行:

bob
is
great

考虑到alk的第二种解决方案:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

char ** split(const char * str, const char * delim)
{
  /* count words */
  char * s = strdup(str);

  if ((s == NULL) /* out of memory */
      || (strtok(s, delim) == 0)) /* no word */
    return NULL;

  size_t nw = 1;

  while (strtok(NULL, delim) != 0)
    nw += 1;

  strcpy(s, str); /* restore initial string modified by strtok */

  /* split */
  char ** v = malloc((nw + 1) * sizeof(char *));

  if (v == NULL)
    /* out of memory */
    return NULL;

  if ((v[0] = strdup(strtok(s, delim))) == 0) {
    /* out of memory */
    free(v);
    return NULL;
  }

  size_t i;

  for (i = 1; i != nw; ++i) {
    if ((v[i] = strdup(strtok(NULL, delim))) == NULL) {
      /* out of memory, free previous allocs */
      while (i-- != 0)
        free(v[i]);
      free(v);
      return NULL;
    }
  }

  v[i] = NULL; /* end mark */

  free(s);

  return v;
}

int main()
{
  const char * s = "bob is still great";
  char ** v = split(s, " ");

  if (v == NULL)
    puts("no words of not enough memory");
  else {
    for (int i = 0; v[i] != NULL; ++i) {
      puts(v[i]);
      free(v[i]);
    }

    free(v);
  }
  return 0;
}

当内存不足时,返回值为NULL(在以前的版本中,它是要拆分的字符串),当然,还有其他方式可以轻松地发出信号


在valgrind下执行:

==5078== Memcheck, a memory error detector
==5078== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5078== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5078== Command: ./a.out
==5078== 
bob
is
still
great
==5078== 
==5078== HEAP SUMMARY:
==5078==     in use at exit: 0 bytes in 0 blocks
==5078==   total heap usage: 7 allocs, 7 frees, 1,082 bytes allocated
==5078== 
==5078== All heap blocks were freed -- no leaks are possible
==5078== 
==5078== For counts of detected and suppressed errors, rerun with: -v
==5078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 3)

答案 1 :(得分:2)

一种拆分字符串的方法,该方法需要返回一个 pointer-to-pointer-to-char 的方法,该方法用于拆分未知数目的单词的字符串并使其可从函数中返回。这提供了一种真正的动态方法,其中您可以分配一些初始数量的指针(例如2, 4, 8等。),您可以使用strtok遍历字符串,从而跟踪使用的指针数量,并分配存储空间在使用每个令牌(单词)时,当所使用的指针数量等于分配的数量时,您只需realloc存储其他指针并继续运行。

实现函数splitstring()的简短示例,其外观类似于以下内容:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NPTR    8   /* initial number of pointers to allocate */
#define MAXD   32   /* maximum no chars for delimiter */
#define MAXC 1024   /* maximum no chars for user input */

char **splitstring (const char *str, const char *delim, size_t *nwords)
{
    size_t nptr = NPTR,             /* initial pointers */
        slen = strlen (str);        /* length of str */
    char **strings = malloc (nptr * sizeof *strings),   /* alloc pointers */
        *cpy = malloc (slen + 1),   /* alloc for copy of str */
        *p = cpy;                   /* pointer to cpy */

    *nwords = 0;                    /* zero nwords */

    if (!strings) {     /* validate allocation of strings */
        perror ("malloc-strings");
        free (cpy);
        return NULL;
    }

    if (!cpy) {         /* validate allocation of cpy */
        perror ("malloc-cpy");
        free (strings);
        return NULL;
    }
    memcpy (cpy, str, slen + 1);    /* copy str to cpy */

    /* split cpy into tokens */
    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        size_t len;             /* length of token */
        if (*nwords == nptr) {  /* all pointers used/realloc needed? */
            void *tmp = realloc (strings, 2 * nptr * sizeof *strings);
            if (!tmp) {         /* validate reallocation */
                perror ("realloc-strings");
                if (*nwords)    /* if words stored, return strings */
                    return strings;
                else {          /* no words, free pointers, return NULL */
                    free (strings);
                    return NULL;
                }
            }
            strings = tmp;      /* assign new block to strings */
            nptr *= 2;          /* update number of allocate pointers */
        }
        len = strlen (p);       /* get token length */
        strings[*nwords] = malloc (len + 1);    /* allocate storage */
        if (!strings[*nwords]) {                /* validate allocation */
            perror ("malloc-strings[*nwords]");
            break;
        }
        memcpy (strings[(*nwords)++], p, len + 1);  /* copy to strings */
    }
    free (cpy);     /* free storage of cpy of str */

    if (*nwords)    /* if words found */
        return strings;

    free (strings); /* no strings found, free pointers */
    return NULL;
}

int main (void) {

    char **strings = NULL, 
        string[MAXC],
        delim[MAXD];
    size_t nwords = 0;

    fputs ("enter string    : ", stdout);
    if (!fgets (string, MAXC, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    fputs ("enter delimiters: ", stdout);
    if (!fgets (delim, MAXD, stdin)) {
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    if ((strings = splitstring (string, delim, &nwords))) {
        for (size_t i = 0; i < nwords; i++) {
            printf (" word[%2zu]: %s\n", i, strings[i]);
            free (strings[i]);
        }
        free (strings);
    }
    else
        fputs ("error: no delimiter found\n", stderr);
}

注意:单词计数nwords作为指向splitstring()函数的指针进行传递,以允许在函数中更新单词数并使其返回在调用函数中,同时从函数本身返回 pointer-to-pointer-to-char

使用/输出示例

$ ./bin/stringsplitdelim
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas

注意:),在上方输入了一个' '(空格)作为分隔符,导致delim包含" \n"(正是您想要的)已将面向行的输入功能fgets用于用户输入)

内存使用/错误检查

在您编写的任何动态分配内存的代码中,对于任何分配的内存块,您都有2个职责:(1)始终保留指向起始地址的指针因此,(2)当不再需要它时可以释放

当务之急是使用一个内存错误检查程序来确保您不会尝试访问内存或在已分配的块的边界之外/之外进行写入,不要试图以未初始化的值读取或基于条件跳转,最后,以确认您释放了已分配的所有内存。

对于Linux,valgrind是正常选择。每个平台都有类似的内存检查器。它们都很容易使用,只需通过它运行程序即可。

$ valgrind ./bin/stringsplitdelim
==12635== Memcheck, a memory error detector
==12635== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12635== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==12635== Command: ./bin/stringsplitdelim
==12635==
enter string    : my dog has fleas and my cat has none and snakes don't have fleas
enter delimiters:
 word[ 0]: my
 word[ 1]: dog
 word[ 2]: has
 word[ 3]: fleas
 word[ 4]: and
 word[ 5]: my
 word[ 6]: cat
 word[ 7]: has
 word[ 8]: none
 word[ 9]: and
 word[10]: snakes
 word[11]: don't
 word[12]: have
 word[13]: fleas
==12635==
==12635== HEAP SUMMARY:
==12635==     in use at exit: 0 bytes in 0 blocks
==12635==   total heap usage: 17 allocs, 17 frees, 323 bytes allocated
==12635==
==12635== All heap blocks were freed -- no leaks are possible
==12635==
==12635== For counts of detected and suppressed errors, rerun with: -v
==12635== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认已释放已分配的所有内存,并且没有内存错误。

仔细检查一下,如果还有其他问题,请告诉我。

相关问题