标记外部文件

时间:2018-11-24 02:32:26

标签: c struct dynamic-memory-allocation strtok strcpy

因此,我一直被困在如何标记FIRST标记并将该值放入结构中的问题上。在我的情况下,我试图从如下文件中读取行:

TDV格式:

 TN     1424325600000   dn20t1kz0xrz    67.0    0.0  0.0     0.0    101872.0    262.5665
 TN     1422770400000   dn2dcstxsf5b    23.0    0.0  100.0   0.0    100576.0    277.8087
 TN     1422792000000   dn2sdp6pbb5b    96.0    0.0  100.0   0.0    100117.0    278.49207
 TN     1422748800000   dn2fjteh8e80    6.0     0.0  100.0   0.0    100661.0    278.28485
 TN     1423396800000   dn2k0y7ffcup    14.0    0.0  100.0   0.0    100176.0    282.02142 

如您所见,有一个TN指示一个州的代码。在下面的函数中,我需要能够识别出一行是针对特定状态的,并将其发送到结构。

这是我应该执行的功能。我已评论了此功能中需要做的事情的清单。我以为我做对了,但是当我打印出来时,发现实际上正在发生完全不同的事情:

void analyze_file(FILE *file, struct climate_info **states, int num_states)
{
    const int line_sz = 100;
    char line[line_sz];
    int counter = 0;
    char *token;

    while (fgets(line, line_sz, file) != NULL)
    {
        /* TODO: We need to do a few things here:
         *
         *       * Tokenize the line.
         *       * Determine what state the line is for. This will be the state
         *         code, stored as our first token.
         *       * If our states array doesn't have a climate_info entry for
         *         this state, then we need to allocate memory for it and put it
         *         in the next open place in the array. Otherwise, we reuse the
         *         existing entry.
         *       * Update the climate_info structure as necessary.
         */
        struct climate_info *states = malloc(sizeof(struct climate_info)*num_states);
        token = strtok(line," \n");
        strcpy(states->code, token);
        //printf("token: %s\n", token);

        while(token)
        {

            printf("token: %s\n", token);
            token = strtok(NULL, " \t");

        }
    }
    printf("%d\n",counter);

}

这是我定义的结构:

struct climate_info
{
    char code[3];
    unsigned long num_records;
    long long millitime;
    char location[13];
     double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
     double temperature;
};

这是我打印输出的位置,这是我的程序似乎无法识别analyze_file函数正在执行的操作的地方:

void print_report(struct climate_info *states[], int num_states)
{
    printf("States found: ");
    int i;
    for (i = 0; i < num_states; ++i)
    {
        if (states[i] != NULL)
        {
            struct climate_info *info = states[i];
            printf("%s", info->code);
        }
    }
    printf("\n");

输出应如下所示:找到的国家/地区:TN 我能够标记我的字符串并输出每行的每个标记,但是问题是当我尝试提供结构值时。在我的分析文件行中:strcpy(states-> code,token);我试图获取我知道的第一个令牌,该令牌是状态码,并将其分配给我从结构创建的已分配空间。从我的print_report函数可以看到,它似乎没有意识到我正在向气候信息发送值。我的问题是如何在不更改print_report函数的情况下修复我的analytics_file函数。

1 个答案:

答案 0 :(得分:1)

您似乎很难找出如何使用"TN"的困难,这是因为您试图将在每一行中读取的所有数据存储在单独的结构中。如评论中所述,这对于将数据读取到数据库中可能很好,在该数据库中,数据库提供了按状态缩写查询所有记录的功能,但会使数据处理更加尴尬。为什么?

将所有记录存储为单独的结构时,数据所属的状态与该结构的code成员以外的存储信息之间没有关系。这意味着,如果您希望搜索或打印信息,例如"TN",您必须遍历每个单一的结构,检查code成员是否与"TN"相匹配。考虑打印。您必须为每个状态循环,然后每次为单个状态选择要打印的信息时,就遍历每个结构。

为什么不将每个信息记录作为一个元素存储在记录数组中,为什么不拥有一个状态数组,其中每个状态都包含指向该状态的数据的指针。这样会使您的num_records成员更有意义。然后,您只需要遍历状态数组,检查是否为(num_records > 0),然后为该状态打印num_records信息,同时跳过所有未存储数据的状态。这提供了一种更有效的方法。

例如,只需花费很少的精力就可以稍微重新排列结构,以在状态和与该状态相关联的数据之间建立关系,例如:

#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };
...
typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

但是如何关联从文件中读取"TN"以获取以正确状态存储的数据?这是查找表的来源。如果您有另一个包含状态名称和缩写的简单结构,则可以创建一个简单的结构数组,其中包含缩写信息以及在您阅读时(例如, "TN"从文件中,您可以简单地“查询” index ,其中"TN"位于带有缩写的数组中,然后使用该 index 将来自该行的信息存储在statedata_t数组中的相应 index 处。

由于您的“查找数组”将是恒定的,因此它可以简单地是一个声明为const的全局变量。如果您使用多个源文件,则只需在一个文件中定义数组,然后在需要它的其余文件中将其声明为extern。那么,您将如何定义它呢?首先用您想要的信息声明一个sturct(状态名称和缩写),然后声明一个常量数组,对每个信息初始化名称和缩写,例如

typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;
...
const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

现在您有了一个简单的2向查询。给定状态名称或缩写,可以返回 index 在数组中的位置。此外,给定名称后,您可以查找缩写,或者给定缩写,则可以查找名称。

返回索引的简单查找函数可能是:

/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

现在,您可以使用全局查找表找到给定缩写的 index ,您可以通过声明一个由50 {{ 1}},例如

main()

现在您可以开始从文件中读取文件了,根据从文件中读取的缩写,statedata_t进入适当的状态。一种简单的读取方法是将int main (int argc, char **argv) { char buf[MAXC]; /* line buffer */ /* array of 50 statedata_t (one for each state) */ statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}}; 读取到一个单独的数组中,然后将气候数据读取到类型为insert_data临时结构中,您可以将其传递给您的"TN"函数。在climate_t函数中,您只需查找索引(根据需要为insert_data分配或重新分配),然后将临时数据结构分配给state.data的存储块。例如,您的insert_data函数可能类似于以下内容:

data

基本上就是这样。您如何解析每行中的信息取决于您,但是出于示例的考虑,给定示例数据,为简单起见,我仅使用insert_data。综上所述,您可以执行以下操作:

/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

示例输入文件

sscanf

使用/输出示例

#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };

typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;

typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

/* print states with data collected */
void print_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++) {
        if (st[i].num_records) {
            size_t j = 0;
            printf ("\n%s\n", state[i].name);
            for (; j < st[i].num_records; j++)
                printf ("  %13lld  %-12s %5.1f %5.1f %5.1f %5.1f %8.1Lf "
                        "%8.4f\n",
                        st[i].data[j].millitime, st[i].data[j].location,
                        st[i].data[j].humidity, st[i].data[j].snow,
                        st[i].data[j].cloud, st[i].data[j].lightning,
                        st[i].data[j].pressure, st[i].data[j].temperature);
        }
    }
}

/* free allocated memory */
void free_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++)
        if (st[i].num_records)
            free (st[i].data);
}

int main (int argc, char **argv) {

    char buf[MAXC]; /* line buffer */
    /* array of 50 statedata_t (one for each state) */
    statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}};
    /* read from file given as argument (or stdin if none given) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) {     /* read each line of data */
        char code[ABRV+1] = "";         /* declare storage for abriviation */
        climate_t tmp = { .millitime = 0 }; /* declare temp stuct for data */

        /* simple parse of data with sscanf */
        if (sscanf (buf, "%2s %lld %12s %lf %lf %lf %lf %Lf %lf", code,
            &tmp.millitime, tmp.location, &tmp.humidity, &tmp.snow,
            &tmp.cloud, &tmp.lightning, &tmp.pressure, &tmp.temperature)
            == 9) {
            if (!insert_data (stdata, code, &tmp))  /* insert data/validate */
                fprintf (stderr, "error: insert_data failed (%s).\n", code);
        }
        else    /* handle error */
            fprintf (stderr, "error: invalid format:\n%s\n", buf);
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    print_data (stdata);    /* print data */
    free_data (stdata);     /* free allocated memory */

    return 0;
}

内存使用/错误检查

在您编写的任何动态分配内存的代码中,对于任何分配的内存块,您都有2个职责:(1)始终保留指向起始地址的指针因此,(2)当不再需要它时可以释放

当务之急是使用一个内存错误检查程序来确保您不会尝试访问内存或在已分配的块的边界之外/之外进行写入,不要试图以未初始化的值读取或基于条件跳转,最后,以确认您释放了已分配的所有内存。

对于Linux,$ cat dat/state_climate.txt TN 1424325600000 dn20t1kz0xrz 67.0 0.0 0.0 0.0 101872.0 262.5665 TN 1422770400000 dn2dcstxsf5b 23.0 0.0 100.0 0.0 100576.0 277.8087 TN 1422792000000 dn2sdp6pbb5b 96.0 0.0 100.0 0.0 100117.0 278.49207 TN 1422748800000 dn2fjteh8e80 6.0 0.0 100.0 0.0 100661.0 278.28485 TN 1423396800000 dn2k0y7ffcup 14.0 0.0 100.0 0.0 100176.0 282.02142 是正常选择。每个平台都有类似的内存检查器。它们都很容易使用,只需通过它运行程序即可。

$ ./bin/state_climate <dat/state_climate.txt

Tennessee
  1424325600000  dn20t1kz0xrz  67.0   0.0   0.0   0.0 101872.0 262.5665
  1422770400000  dn2dcstxsf5b  23.0   0.0 100.0   0.0 100576.0 277.8087
  1422792000000  dn2sdp6pbb5b  96.0   0.0 100.0   0.0 100117.0 278.4921
  1422748800000  dn2fjteh8e80   6.0   0.0 100.0   0.0 100661.0 278.2849
  1423396800000  dn2k0y7ffcup  14.0   0.0 100.0   0.0 100176.0 282.0214

始终确认已释放已分配的所有内存,并且没有内存错误。

仔细研究一下,并考虑为什么更改结构有意义。如果您有任何问题,请告诉我。