使用strtok将长字符串拆分为较短的字符串时出错

时间:2018-06-07 18:16:40

标签: c parsing char strtok

我有一个功能,我试图分割一个字符串,但不知道它在读取spaces时停止了。

input.csv: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

output.txt的: 18820218,Northern,(null),(null),(null),(null),(null),(null),(null)

typedef struct
{
    long int date;
    char *h_team;
    char *a_team;
    int home_score;
    int away_score;
    char *reason;
    char *city;
    char *country;
    char *neutral_field;

}Data;


void open_output(char *string, FILE **output)
{       
    if((*output=fopen(string, "w")) == NULL)
    {
        printf("%s not found\n", string);
            exit(1);
    }
}

void alloc_Data(Data *d, int size)
{
    d->line1 = (char*)malloc(50*sizeof(char)); 
    d->h_team = (char*)malloc(30*sizeof(char)); 
    d->a_team = (char*)malloc(30*sizeof(char)); 
    d->reason = (char*)malloc(30*sizeof(char)); 
    d->city = (char*)malloc(30*sizeof(char)); 
    d->country = (char*)malloc(30*sizeof(char)); 
    d->neutral_field = (char*)malloc(9*sizeof(char)); 
}

void store(Data *d, FILE *output)
{
    char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                    "Belfast,Ireland,FALSE";
    char *char_date = malloc(10*sizeof(char));
    char *char_hscore = malloc(20*sizeof(char));
    char *char_ascore = malloc(3*sizeof(char));

    char *token;

    token = strtok(string, ",");    
    char_date = token;

    token = strtok(NULL, ",");
    d->h_team = token;  

    token = strtok(NULL, ",");
    d->a_team = token;  

    token = strtok(NULL, ",");
    char_hscore = token;

    token = strtok(NULL, ",");
    char_ascore = token;    

    token = strtok(NULL, ",");
    d->reason = token;  

    token = strtok(NULL, ",");
    d->city = token;    

    token = strtok(NULL, ",");
    d->country = token; 

    token = strtok(NULL, ",");
    d->neutral_field = token;   

    d->date = atoi(char_date);
    d->home_score = atoi(char_hscore);
    d->away_score = atoi(char_ascore);

    fprintf(output, "%li,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    free(string);
    free(char_date);
    free(char_hscore);
    free(char_ascore);
}

int main(int argc, char *argv[])
{
    FILE *output;
    char *string = "saida.txt";

    open_output(string, &output);   

    Data *d;
    d = (Data*)malloc(sizeof(Data)); 
    alloc_Data(d);

    store(d, output);

    free(d);

    return 0;
}

2 个答案:

答案 0 :(得分:2)

Ana,我已经看过你的问题在过去的几次迭代中发生了变化,很明显你知道你需要把它们组合在一起,但是你在某种程度上让自己变得比你需要的更难。试图让他们在一起。

动态分配结构或数据的目的是(1)处理比程序堆栈中更大的数据量(这里不是问题),(2)允许您增加或减少存储量正在使用,因为您的数据需求在程序过程中波动(这里也不是问题),或者(3)允许您根据程序中使用的数据定制存储需求。最后一部分似乎是你正在尝试的,但是通过为你的角色数组分配一个固定的大小 - 你完全失去了根据数据大小调整你的分配的好处。

为了为数据中包含的每个字符串分配存储空间,您需要获取每个字符串的长度,然后为存储分配length + 1个字符( nul-的+1)终止字符)。虽然您可以使用malloc然后strcpy来完成分配并复制到新的内存块,如果您有strdup,则可以在一个函数调用中为您执行这两项操作。 / p>

您面临的困境是" 在获取长度并分配副本之前,我在哪里存储数据?"您可以通过多种方式处理此问题。你可以声明一堆不同的变量,并将数据解析为单独的变量开始(有点凌乱),你可以分配一个带有固定值的结构来初始存储值(一个好的选项,但调用malloc对于3050字符在固定数组执行时没有多大意义,或者您可以声明一个具有固定数组大小的单独临时结构(使用这种方式来收集)将单独的变量混合成一个结构,然后可以很容易地传递给你的分配函数)考虑每一个,并使用最适合你的那个。

你的函数返回类型并不像它们那样有意义。您需要选择一个有意义的返回类型,该类型允许函数指示它是成功还是失败,然后返回一个值(或指向值的指针),为程序的其余部分提供有用的信息。对于分配内存或处理输入或输出的函数,测量函数的成功/失败尤为重要。

除了您选择的返回类型之外,您还需要考虑传递给每个函数的参数。您需要考虑函数中哪些变量需要可用。拿走FILE*参数。你永远不会在你的store()函数之外使用该文件 - 那么为什么你在main()声明它会导致你不得不担心通过一个指针返回开放的流 - 你没有&#39 ;使用。

考虑到这一点,我们可以看一下将程序的各个部分放在一起。

首先,不要在整个代码中使用幻数。 (例如9, 10, 20, 30, 50, etc..)相反,

#define MAXN  9     /* if you need constants, define one (or more) */
#define MAXC 30
#define MAXL 50

(或者您可以将enum用于同一目的)

出于示例的目的,您可以使用动态分配的结构来有效存储数据,并使用临时结构来帮助解析数据行中的值。例如:

typedef struct {    /* struct to hold dynamically allocated data */
    long date;      /* sized to exact number of chars required. */
    int home_score,
        away_score;
    char *h_team,
        *a_team,
        *reason,
        *city,
        *country,
        *neutral_field;
} data_t;

typedef struct {    /* temp struct to parse data from line */
    long date;      /* sized to hold largest anticipated data */
    int home_score,
        away_score;
    char h_team[MAXC],
        a_team[MAXC],
        reason[MAXC],
        city[MAXC],
        country[MAXC],
        neutral_field[MAXN];
} data_tmp_t;

接下来,open_output函数的整个目的是打开一个文件进行编写。它应该在成功时返回打开的文件流,否则返回NULL,例如

/* pass filename to open, returns open file stream pointer on
 * success, NULL otherwise.
 */
FILE *open_output (const char *string)
{       
    FILE *output = NULL;

    if ((output = fopen (string, "w")) == NULL)
        fprintf (stderr, "file open failed. '%s'.\n", string);

    return output;
}

您的alloc_data函数正在分配数据结构并填充其值。它应该在成功时返回指向完全分配和填充的结构的指针,或者在失败时返回NULL,例如

/* pass temporary struct containing data, dynamic struct allocated,
 * each member allocated to hold exact number of chars (+ terminating
 * character). pointer to allocated struct returned on success,
 * NULL otherwise.
 */
data_t *alloc_data (data_tmp_t *tmp)
{
    data_t *d = malloc (sizeof *d); /* allocate structure */

    if (d == NULL)
        return NULL;

    d->date = tmp->date;

    /* allocate each string member with strdup. if not available,
     * simply use malloc (strlen(str) + 1), and then strcpy.
     */
    if ((d->h_team = strdup (tmp->h_team)) == NULL)
        return NULL;
    if ((d->a_team = strdup (tmp->a_team)) == NULL)
        return NULL;

    d->home_score = tmp->home_score;
    d->away_score = tmp->away_score;

    if ((d->reason = strdup (tmp->reason)) == NULL)
        return NULL;
    if ((d->city = strdup (tmp->city)) == NULL)
        return NULL;
    if ((d->country = strdup (tmp->country)) == NULL)
        return NULL;
    if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL)
        return NULL;

    return d;   /* return pointer to allocated struct */
}

每当你分配嵌套在结构(或嵌套结构)中的多个值时,养成将free_data函数写入free alloc_datafree中分配的内存的习惯}。编写一个免费函数来正确处理已分配的复杂结构,与在代码周围进行单独的void调用相比,要好得多。在释放变量时没有返回检查,因此您可以在此处使用/* frees each allocated member of d, and then d itself */ void free_data (data_t *d) { free (d->h_team); free (d->a_team); free (d->reason); free (d->city); free (d->country); free (d->neutral_field); free (d); } 函数:

store()

您的string功能是进行大多数决策和验证检查的地方。您的代码的目的是解析然后将filename存储在store()中。这应该让你思考需要什么参数。文件处理的其余部分都可以在FILE内部,因为FILE未在调用函数中进一步使用。现在,根据您正在执行的写入次数,在main()中声明并打开FILE*一次然后传递一个打开(并验证)fopen参数可能非常有意义。然后,close只需要一次main()来电和store。出于此目的,所有内容都将在fclose中处理,因此您可以通过检查NULL的返回值来检查每次写入后的任何流错误。

由于您正在分配和存储可能需要在调用函数中进一步使用的结构,因此选择返回指向调用者的指针(或者在失败时返回store())可以很好地选择返回类型。 /* parses data in string into separate values and stores data in string * to filename (note: use mode "a" to append instead of "w" which * truncates). returns pointer to fully-allocated struct on success, * NULL otherwise. */ data_t *store (const char *string, const char *filename) { data_tmp_t tmp = { .date = 0 }; data_t *d = NULL; FILE *output = open_output (filename); /* no need to pass in */ /* not used later in main */ if (output == NULL) { /* validate file open for writing */ return NULL; } /* parse csv values with sscanf - avoids later need to convert values * validate all values successfully converted. */ if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," "%29[^,],%8[^\n]", &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score, &tmp.away_score, tmp.reason, tmp.city, tmp.country, tmp.neutral_field) != 9) { fprintf (stderr, "error: failed to parse string.\n"); return NULL; } d = alloc_data (&tmp); /* allocate d and deep-copy tmp to d */ if (d == NULL) { /* validate allocation/copy succeeded */ perror ("malloc-alloc_data"); return NULL; } /* output values to file */ fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, d->a_team, d->home_score, d->away_score, d->reason, d->city, d->country, d->neutral_field ); if (fclose (output) == EOF) /* always validate close-after-write */ perror ("stream error-output"); return d; /* return fully allocated/populated struct */ } 。你可以这样做:

main()

您的"saida.txt"只能处理需要解析的字符串,将数据写入的文件名,以及指向解析后产生的完全分配结构的指针,以便进一步使用。 (它还将文件作为程序的第一个参数写入 - 或者如果没有提供参数,它将默认写入int main (int argc, char *argv[]) { char *string = "18820218,Northern Ireland,England,0,13,Friendly," "Belfast,Ireland,FALSE"; /* filename set to 1st argument (or "saida.txt" by default) */ char *filename = argc > 1 ? argv[1] : "saida.txt"; data_t *d = NULL; d = store (string, filename); /* store string in filename */ if (d == NULL) { /* validate struct returned */ fprintf (stderr, "error: failed to store string.\n"); return 1; } /* output struct values as confirmation of what was stored in file */ printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, d->a_team, d->home_score, d->away_score, d->reason, d->city, d->country, d->neutral_field ); free_data (d); /* free all memory when done */ return 0; } ,例如。

camelCase

虽然没有C标准规定,但标准&#34; C的编码样式避免使用MixedCase#include <stdio.h> #include <stdlib.h> #include <string.h> #define MAXN 9 /* if you need constants, define one (or more) */ #define MAXC 30 #define MAXL 50 typedef struct { /* struct to hold dynamically allocated data */ long date; /* sized to exact number of chars required. */ int home_score, away_score; char *h_team, *a_team, *reason, *city, *country, *neutral_field; } data_t; typedef struct { /* temp struct to parse data from line */ long date; /* sized to hold largest anticipated data */ int home_score, away_score; char h_team[MAXC], a_team[MAXC], reason[MAXC], city[MAXC], country[MAXC], neutral_field[MAXN]; } data_tmp_t; /* pass filename to open, returns open file stream pointer on * success, NULL otherwise. */ FILE *open_output (const char *string) { FILE *output = NULL; if ((output = fopen (string, "w")) == NULL) fprintf (stderr, "file open failed. '%s'.\n", string); return output; } /* pass temporary struct containing data, dynamic struct allocated, * each member allocated to hold exact number of chars (+ terminating * character). pointer to allocated struct returned on success, * NULL otherwise. */ data_t *alloc_data (data_tmp_t *tmp) { data_t *d = malloc (sizeof *d); /* allocate structure */ if (d == NULL) return NULL; d->date = tmp->date; /* allocate each string member with strdup. if not available, * simply use malloc (strlen(str) + 1), and then strcpy. */ if ((d->h_team = strdup (tmp->h_team)) == NULL) return NULL; if ((d->a_team = strdup (tmp->a_team)) == NULL) return NULL; d->home_score = tmp->home_score; d->away_score = tmp->away_score; if ((d->reason = strdup (tmp->reason)) == NULL) return NULL; if ((d->city = strdup (tmp->city)) == NULL) return NULL; if ((d->country = strdup (tmp->country)) == NULL) return NULL; if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL) return NULL; return d; /* return pointer to allocated struct */ } /* frees each allocated member of d, and then d itself */ void free_data (data_t *d) { free (d->h_team); free (d->a_team); free (d->reason); free (d->city); free (d->country); free (d->neutral_field); free (d); } /* parses data in string into separate values and stores data in string * to filename (note: use mode "a" to append instead of "w" which * truncates). returns pointer to fully-allocated struct on success, * NULL otherwise. */ data_t *store (const char *string, const char *filename) { data_tmp_t tmp = { .date = 0 }; data_t *d = NULL; FILE *output = open_output (filename); /* no need to pass in */ /* not used later in main */ if (output == NULL) { /* validate file open for writing */ return NULL; } /* parse csv values with sscanf - avoids later need to convert values * validate all values successfully converted. */ if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," "%29[^,],%8[^\n]", &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score, &tmp.away_score, tmp.reason, tmp.city, tmp.country, tmp.neutral_field) != 9) { fprintf (stderr, "error: failed to parse string.\n"); return NULL; } d = alloc_data (&tmp); /* allocate d and deep-copy tmp to d */ if (d == NULL) { /* validate allocation/copy succeeded */ perror ("malloc-alloc_data"); return NULL; } /* output values to file */ fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, d->a_team, d->home_score, d->away_score, d->reason, d->city, d->country, d->neutral_field ); if (fclose (output) == EOF) /* always validate close-after-write */ perror ("stream error-output"); return d; /* return fully allocated/populated struct */ } int main (int argc, char *argv[]) { char *string = "18820218,Northern Ireland,England,0,13,Friendly," "Belfast,Ireland,FALSE"; /* filename set to 1st argument (or "saida.txt" by default) */ char *filename = argc > 1 ? argv[1] : "saida.txt"; data_t *d = NULL; d = store (string, filename); /* store string in filename */ if (d == NULL) { /* validate struct returned */ fprintf (stderr, "error: failed to store string.\n"); return 1; } /* output struct values as confirmation of what was stored in file */ printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, d->a_team, d->home_score, d->away_score, d->reason, d->city, d->country, d->neutral_field ); free_data (d); /* free all memory when done */ return 0; } 变量名来支持所有小写,同时保留大写名称以供使用用宏和常量。这是一个风格问题 - 所以它完全取决于你,但如果不遵循它可能会在某些圈子中产生错误的第一印象。

完全放弃,您可以执行以下操作:

$ ./bin/store_teams dat/saida.txt
stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

示例使用/输出

$ cat dat/saida.txt
18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

验证输出文件

malloc

内存使用/错误检查

没有必要强制转换valgrind,这是不必要的。请参阅:Do I cast the result of malloc?

在你编写的动态分配内存的任何代码中,你有2个职责关于任何分配的内存块:(1)总是保留一个指向起始地址的指针内存块,(2)当不再需要时,它可以释放

必须使用内存错误检查程序,以确保您不会尝试访问内存或写入超出/超出已分配块的范围,尝试读取或基于未初始化值的条件跳转,最后,确认您释放了所有已分配的内存。

对于Linux $ valgrind ./bin/store_teams dat/saida.txt ==16038== Memcheck, a memory error detector ==16038== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==16038== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==16038== Command: ./bin/store_teams dat/saida.txt ==16038== stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE ==16038== ==16038== HEAP SUMMARY: ==16038== in use at exit: 0 bytes in 0 blocks ==16038== total heap usage: 8 allocs, 8 frees, 672 bytes allocated ==16038== ==16038== All heap blocks were freed -- no leaks are possible ==16038== ==16038== For counts of detected and suppressed errors, rerun with: -v ==16038== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 是正常的选择。每个平台都有类似的记忆检查器。它们都很简单易用,只需通过它运行程序即可。

{{1}}

始终确认已释放已分配的所有内存并且没有内存错误。

希望这有助于您了解如何更好地将拼图拼凑成一个不那么混乱的方式,以及如何关注每个函数所需的参数,以及如何考虑为每个函数选择有意义的类型返回。仔细看看,如果您有其他问题,请告诉我。

答案 1 :(得分:1)

显示的代码不会编译 - 构建,原因如下:

  • 结构中不存在成员d->line1
  • 函数void alloc_Data(Data *d, int size)有两个参数, 但是电话:alloc_Data(d);只有一个参数。

此外,由于未提供函数open_output(string, &output);的定义,因此任何试图提供帮助的人都无法运行代码。 (超出这一点的假设)

除此之外......

此:

    token = strtok(NULL, ",");
    d->h_team = token;  

实际上是更改以前malloc指针的地址,导致内存泄漏。 (这是因为对free(d->h_team);的任何后续调用都将发送到一个从未进行过malloc编辑的地址位置。

此:

    token = strtok(NULL, ",");
    strcpy(d->h_team,token);

导致将位于token地址的内容分配到位于d->h_team的地址,这意味着您在完成使用后仍然可以调用free(d->h_team);。 (避免内存泄漏)

要了解您所看到的失败,这可能有所帮助:

    char *string = "18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE";
    char *workingbuf = '\0'

    workingbuf  = strdup(string);
    token = strtok(string, ",");
    ...    

最后一想,在假设strtok()包含任何内容之前检查token的输出是个好主意:

    token = strtok(NULL, ",");
    if(token)
    {
        d->h_team = token;
        ...  

<强> 修改
实施上面建议的更改后,包括添加open_output,您的代码就会运行。