如何使用大于一个char的分隔符拆分字符串?

时间:2010-03-28 01:41:05

标签: c string string-split

假设我有这个:

"foo bar 1 and foo bar 2"

如何将其拆分为:

foo bar 1
foo bar 2

我尝试了strtok()strsep(),但都没有效果。他们不承认“和”作为分隔符,他们将“a”,“n”和“d”视为分隔符。

任何帮助我解决这个问题的函数,或者我必须用空格分割并进行一些字符串操作?

4 个答案:

答案 0 :(得分:5)

你可以使用strstr()来找到第一个“和”,并通过跳过这么多字符来自己“标记”字符串,并再次进行。

答案 1 :(得分:5)

在C中拆分字符串的主要问题是它不可避免地存在 导致一些动态内存管理,并且往往可以避免 尽可能通过标准库。这就是为什么没有标准 C函数处理动态内存分配,只处理malloc / calloc / realloc 那样做。

但要做到这一点并不困难。让我带你走过 它

我们需要返回一些字符串,这是最简单的方法 是返回一个指向字符串的指针数组,由字符串终止 一个NULL项。除了最后的NULL,数组中的每个元素都指向 一个动态分配的字符串。

首先,我们需要一些辅助函数来处理这样的数组。 最简单的是计算字符串数量的元素 在最后的NULL之前):

/* Return length of a NULL-delimited array of strings. */
size_t str_array_len(char **array)
{
    size_t len;

    for (len = 0; array[len] != NULL; ++len)
        continue;
    return len;
}

另一个简单的方法是释放数组的功能:

/* Free a dynamic array of dynamic strings. */
void str_array_free(char **array)
{
    if (array == NULL)
        return;
    for (size_t i = 0; array[i] != NULL; ++i)
        free(array[i]);
    free(array);
}

更复杂的是添加字符串副本的函数 到阵列。它需要处理一些特殊情况,例如何时 该数组尚不存在(整个数组为NULL)。此外,它需要 处理未以'\ 0'结尾的字符串,以便更容易 我们实际的分割函数只是在使用时输入字符串的一部分 追加。

/* Append an item to a dynamically allocated array of strings. On failure,
   return NULL, in which case the original array is intact. The item
   string is dynamically copied. If the array is NULL, allocate a new
   array. Otherwise, extend the array. Make sure the array is always
   NULL-terminated. Input string might not be '\0'-terminated. */
char **str_array_append(char **array, size_t nitems, const char *item, 
                        size_t itemlen)
{
    /* Make a dynamic copy of the item. */
    char *copy;
    if (item == NULL)
        copy = NULL;
    else {
        copy = malloc(itemlen + 1);
        if (copy == NULL)
            return NULL;
        memcpy(copy, item, itemlen);
        copy[itemlen] = '\0';
    }

    /* Extend array with one element. Except extend it by two elements, 
       in case it did not yet exist. This might mean it is a teeny bit
       too big, but we don't care. */
    array = realloc(array, (nitems + 2) * sizeof(array[0]));
    if (array == NULL) {
        free(copy);
        return NULL;
    }

    /* Add copy of item to array, and return it. */
    array[nitems] = copy;
    array[nitems+1] = NULL;
    return array;
}

这是一个moutful。对于非常好的风格,它会更好 如果输入项为自己的,则拆分动态副本的制作 功能,但我会将其作为练习留给读者。

最后,我们有实际的分裂功能。它也需要处理 一些特殊情况:

  • 输入字符串可能以分隔符开头或结尾。
  • 可能会有分隔符彼此相邻。
  • 输入字符串可能根本不包含分隔符。

如果分隔符是,我选择在结果中添加一个空字符串 紧挨着输入字符串的开头或结尾,或紧挨着 另一个分隔符如果你还需要别的东西,你需要调整一下 代码。

除特殊情况外,还有一些错误处理,分裂 现在相当简单。

/* Split a string into substrings. Return dynamic array of dynamically
   allocated substrings, or NULL if there was an error. Caller is
   expected to free the memory, for example with str_array_free. */
char **str_split(const char *input, const char *sep)
{
    size_t nitems = 0;
    char **array = NULL;
    const char *start = input;
    char *next = strstr(start, sep);
    size_t seplen = strlen(sep);
    const char *item;
    size_t itemlen;

    for (;;) {
        next = strstr(start, sep);
        if (next == NULL) {
            /* Add the remaining string (or empty string, if input ends with
               separator. */
            char **new = str_array_append(array, nitems, start, strlen(start));
            if (new == NULL) {
                str_array_free(array);
                return NULL;
            }
            array = new;
            ++nitems;
            break;
        } else if (next == input) {
            /* Input starts with separator. */
            item = "";
            itemlen = 0;
        } else {
            item = start;
            itemlen = next - item;
        }
        char **new = str_array_append(array, nitems, item, itemlen);
        if (new == NULL) {
            str_array_free(array);
            return NULL;
        }
        array = new;
        ++nitems;
        start = next + seplen;
    }

    if (nitems == 0) {
        /* Input does not contain separator at all. */
        assert(array == NULL);
        array = str_array_append(array, nitems, input, strlen(input));
    }

    return array;
}

这是整个程序的一部分。它还包括一个主程序 运行一些测试用例。

#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* Append an item to a dynamically allocated array of strings. On failure,
   return NULL, in which case the original array is intact. The item
   string is dynamically copied. If the array is NULL, allocate a new
   array. Otherwise, extend the array. Make sure the array is always
   NULL-terminated. Input string might not be '\0'-terminated. */
char **str_array_append(char **array, size_t nitems, const char *item, 
                        size_t itemlen)
{
    /* Make a dynamic copy of the item. */
    char *copy;
    if (item == NULL)
        copy = NULL;
    else {
        copy = malloc(itemlen + 1);
        if (copy == NULL)
            return NULL;
        memcpy(copy, item, itemlen);
        copy[itemlen] = '\0';
    }

    /* Extend array with one element. Except extend it by two elements, 
       in case it did not yet exist. This might mean it is a teeny bit
       too big, but we don't care. */
    array = realloc(array, (nitems + 2) * sizeof(array[0]));
    if (array == NULL) {
        free(copy);
        return NULL;
    }

    /* Add copy of item to array, and return it. */
    array[nitems] = copy;
    array[nitems+1] = NULL;
    return array;
}


/* Free a dynamic array of dynamic strings. */
void str_array_free(char **array)
{
    if (array == NULL)
        return;
    for (size_t i = 0; array[i] != NULL; ++i)
        free(array[i]);
    free(array);
}


/* Split a string into substrings. Return dynamic array of dynamically
   allocated substrings, or NULL if there was an error. Caller is
   expected to free the memory, for example with str_array_free. */
char **str_split(const char *input, const char *sep)
{
    size_t nitems = 0;
    char **array = NULL;
    const char *start = input;
    char *next = strstr(start, sep);
    size_t seplen = strlen(sep);
    const char *item;
    size_t itemlen;

    for (;;) {
        next = strstr(start, sep);
        if (next == NULL) {
            /* Add the remaining string (or empty string, if input ends with
               separator. */
            char **new = str_array_append(array, nitems, start, strlen(start));
            if (new == NULL) {
                str_array_free(array);
                return NULL;
            }
            array = new;
            ++nitems;
            break;
        } else if (next == input) {
            /* Input starts with separator. */
            item = "";
            itemlen = 0;
        } else {
            item = start;
            itemlen = next - item;
        }
        char **new = str_array_append(array, nitems, item, itemlen);
        if (new == NULL) {
            str_array_free(array);
            return NULL;
        }
        array = new;
        ++nitems;
        start = next + seplen;
    }

    if (nitems == 0) {
        /* Input does not contain separator at all. */
        assert(array == NULL);
        array = str_array_append(array, nitems, input, strlen(input));
    }

    return array;
}


/* Return length of a NULL-delimited array of strings. */
size_t str_array_len(char **array)
{
    size_t len;

    for (len = 0; array[len] != NULL; ++len)
        continue;
    return len;
}


#define MAX_OUTPUT 20


int main(void)
{
    struct {
        const char *input;
        const char *sep;
        char *output[MAX_OUTPUT];
    } tab[] = {
        /* Input is empty string. Output should be a list with an empty 
           string. */
        {
            "",
            "and",
            {
                "",
                NULL,
            },
        },
        /* Input is exactly the separator. Output should be two empty 
           strings. */
        {
            "and",
            "and",
            {
                "",
                "",
                NULL,
            },
        },
        /* Input is non-empty, but does not have separator. Output should
           be the same string. */
        {
            "foo",
            "and",
            {
                "foo",
                NULL,
            },
        },
        /* Input is non-empty, and does have separator. */
        {
            "foo bar 1 and foo bar 2",
            " and ",
            {
                "foo bar 1",
                "foo bar 2",
                NULL,
            },
        },
    };
    const int tab_len = sizeof(tab) / sizeof(tab[0]);
    bool errors;

    errors = false;

    for (int i = 0; i < tab_len; ++i) {
        printf("test %d\n", i);

        char **output = str_split(tab[i].input, tab[i].sep);
        if (output == NULL) {
            fprintf(stderr, "output is NULL\n");
            errors = true;
            break;
        }
        size_t num_output = str_array_len(output);
        printf("num_output %lu\n", (unsigned long) num_output);

        size_t num_correct = str_array_len(tab[i].output);
        if (num_output != num_correct) {
            fprintf(stderr, "wrong number of outputs (%lu, not %lu)\n",
                    (unsigned long) num_output, (unsigned long) num_correct);
            errors = true;
        } else {
            for (size_t j = 0; j < num_output; ++j) {
                if (strcmp(tab[i].output[j], output[j]) != 0) {
                    fprintf(stderr, "output[%lu] is '%s' not '%s'\n",
                            (unsigned long) j, output[j], tab[i].output[j]);
                    errors = true;
                    break;
                }
            }
        }

        str_array_free(output);
        printf("\n");
    }

    if (errors)
        return EXIT_FAILURE;   
    return 0;
}

答案 2 :(得分:2)

这是一个很好的简短示例,我刚刚写了如何使用strstr分割给定字符串上的字符串:

#include <string.h>
#include <stdio.h>

void split(char *phrase, char *delimiter)
{
    char *loc = strstr(phrase, delimiter);
    if (loc == NULL)
    {
        printf("Could not find delimiter\n");
    }
    else
    {
        char buf[256]; /* malloc would be much more robust here */
        int length = strlen(delimiter);
        strncpy(buf, phrase, loc - phrase);
        printf("Before delimiter: '%s'\n", buf);
        printf("After delimiter: '%s'\n", loc+length);
    }
}

int main()
{
    split("foo bar 1 and foo bar 2", "and");
    printf("-----\n");
    split("foo bar 1 and foo bar 2", "quux");
    return 0;
}

输出:

Before delimiter: 'foo bar 1 '
After delimiter: ' foo bar 2'
-----
Could not find delimiter

当然,我还没有对它进行全面测试,并且它可能容易受到与字符串长度相关的大多数标准缓冲区溢出问题的影响;但它至少是一个可证明的例子。

答案 3 :(得分:0)

如果您知道分隔符示例逗号或分号的类型,可以尝试使用它:

#include<stdio.h>
#include<conio.h>
int main()
{
  int i=0,temp=0,temp1=0, temp2=0;
  char buff[12]="123;456;789";
   for(i=0;buff[i]!=';',i++)
   {
     temp=temp*10+(buff[i]-48);
   }
   for(i=0;buff[i]!=';',i++)
   {
     temp1=temp1*10+(buff[i]-48);
   }
   for(i=0;buff[i],i++)
   {
     temp2=temp2*10+(buff[i]-48);
   }
    printf("temp=%d temp1=%d temp2=%d",temp,temp1,temp2);
    getch();
  return 0;
}

输出:

temp=123 temp1=456 temp2=789