难以分割从C

时间:2017-06-03 00:52:15

标签: c file-io

我需要从文件中读取输入,然后将大写的单词从它的定义中拆分出来。我的麻烦是我需要从文件中的多行作为一个变量将其传递给另一个函数。 我想要读取的文件看起来像这样

消色差的。适用于那些望远镜的光学术语 光线的像差和依赖于它的颜色是 部分纠正。 (参见 APLANATIC。)

ACHRONICAL。一个古老的术语,意味着天上升起 日落时的尸体,或日出时的身体。

跨越潮流。一艘船穿越潮汐,随风而行 潮流的方向,倾向于她的锚的下风;但有一个 如果潮水强烈,天气潮,或逆风而行, 会倾向于迎风。帆下的船应该更喜欢那种 当锚点出现时,横流的是横跨溪流 放手。

现在我的代码将其中的单词与其余单词分开,但是我很难将其余的输入转换为一个变量。

while(fgets(line, sizeof(line), mFile) != NULL){
    if (strlen(line) != 2){
        if (isupper(line[0]) && isupper(line[1])){
            word = strtok(line, ".");
            temp = strtok(NULL, "\n");
            len = strlen(temp);
            for (i=0; i < len; i++){
                *(defn+i) = *(temp+i);
            }
            printf("Word: %s\n", word);
        }
        else{

            temp = strtok(line, "\n");
            for (i=len; i < strlen(temp) + len; i++);
                *(defn+i) = *(temp+i-len);
            len = len + strlen(temp);
            //printf(" %s\n", temp);
        }
    }
    else{
        len = 0;
        printf("%s\n", defn);
        index = 0;
    }
}

3 个答案:

答案 0 :(得分:0)

像这样:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

//another function
void func(char *word, char *defs){
    printf("<%s>\n", word);
    if(defs){
        printf("%s", defs);
    }
}

int main(void){
    char buffer[4096], *curr = buffer;
    size_t len, buf_size = sizeof buffer;
    FILE *fp = fopen("dic.txt", "r");

    while(fgets(curr, buf_size, fp)){
        //check definition line
        if(*curr == '\n' || !isupper(*curr)){
            continue;//printf("invalid format\n");
        }
        len = strlen(curr);
        curr += len;
        buf_size -= len;
        //read rest line
        while(1){
            curr = fgets(curr, buf_size, fp);
            if(!curr || *curr == '\n'){//upto EOF or blank line
                char *word, *defs;
                char *p = strchr(buffer, '.');
                if(p)
                    *p++ = 0;
                word = buffer;
                defs = p;
                func(word, defs);
                break;
            }
            len = strlen(curr);
            curr += len;
            buf_size -= len;
            assert(buf_size >= 2 || (fprintf(stderr, "small buffer\n"), 0));
        }
        curr = buffer;
        buf_size = sizeof buffer;
    }
    fclose(fp);

    return 0;
}

答案 1 :(得分:0)

看起来您需要先从行的开头拉一串大写字母,直到第一个句点,然后将该行的其余部分与后续行连接,直到找到一个空行。泡沫,冲洗,根据需要重复。 虽然这个任务在Perl中会更容易,但如果你需要在C中完成,我建议使用内置的字符串函数,而不是构建自己的for循环来复制数据。也许类似于以下内容:

while(fgets(line, sizeof(line), mFile) != NULL) {
    if (strlen(line) > 2) {
        if (isupper(line[0]) && isupper(line[1])) {
            word = strtok(line, ".");
            strcpy(defn,strtok(NULL, "\n"));
            printf("Word: %s\n", word);
        } else {
            strcat(defn,strtok(line, "\n"));
        }
    } else {
        printf("%s\n", defn);
        defn[0] = 0;
    }
}

当我把它放在结构合理的C程序中时,使用适当的包含文件,它可以正常工作。我个人会以不同的方式处理这个问题,但希望这能帮助你解决问题。

答案 2 :(得分:0)

有几个方面可以解决。根据您的示例输入和描述,您的目标是开发一个函数,该函数将读取和分离每个单词(或短语)和相关定义,返回指向单词/定义集合的指针,同时还更新指向该数字的指针读取单词和定义,以便在调用函数中返回数字(main此处)。

虽然您的数据表明单词和定义都包含在单行文本中,而单词(或短语以大写字母书写),但不清楚是否必须解决定义可以解决的问题跨越多行(基本上导致您可能读取多行并将它们组合以形成完整的定义。

每当您需要在单个对象中维护多个变量之间的关系时,struct是基础数据对象的不错选择。使用 struct 数组,一旦读入内存,就可以访问每个单词及其相关定义。现在你的例子有3个单词和定义。 (每个以'\n'分隔)。创建一个包含3个结构的数组来保存数据是微不足道的,但是当读取数据时,就像字典一样,你很少知道你将要阅读多少单词。

为了处理这种情况,动态结构数组是一个合适的数据结构。您实际上为一些合理数量的单词/定义分配空间,然后如果达到该限制,只需realloc包含数据的数组,更新您的限制以反映分配的新大小,并继续。

虽然您可以使用strtok通过查找第一个'.'来分隔单词(或短语),但这有点过分。无论如何你都需要遍历每个char以检查它们是否都是大写字母,你也可以迭代直到找到'.'并使用该字符索引的数字来存储你的单词并设置指针'.'之后的下一个字符。您将从那里开始寻找定义的开头(您基本上想要跳过任何不是[a-zA-Z]的字符)。找到定义的开头后,您可以简单地获取行的其余部分的长度,并将其复制为定义(或者如果定义包含在多个单独的行中,则将其复制到第一部分)。

在读取文件并返回指针并更新了单词数量的指针后,您可以根据需要在main中使用结构数组。完成信息后,您应该free已分配的所有内存。

由于通常知道最大单词或短语的大小,因此使用的结构为单词提供静态存储。给定义的长度可以变化很大并且更长,结构只包含指向char * 的指针。因此,您必须为每个结构分配存储,然后为每个结构中的每个定义分配存储。

以下代码就是这样做的。它将把文件名作为第一个参数读取(如果没有给出文件名,它将默认从stdin读取)。代码输出单行上的单词和定义。代码被大量评论,以帮助您跟进并解释逻辑,例如。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum {MAXW = 64, NDEF = 128}; 

typedef struct {        /* struct holding words/definitions */
    char word[MAXW],
        *def;           /* you must allocate space for def */
} defn;

defn *readdict (FILE *fp, size_t *n);

int main (int argc, char **argv) {

    defn *defs = NULL;
    size_t n = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    if (!(defs = readdict (fp, &n))) { /* read words/defs into defs */
        fprintf (stderr, "readdict() error: no words read from file.\n");
        return 1;
    }

    if (fp != stdin) fclose (fp);     /* close file if not stdin */

    for (size_t i = 0; i < n; i++) {
        printf ("\nword: %s\n\ndefinition: %s\n", defs[i].word, defs[i].def);
        free (defs[i].def);         /* free allocated definitions */
    }
    free (defs);    /* free array of structs */

    return 0;
}

/** read word and associated definition from open file stream 'fp'
 *  into dynamic array of struct, updating pointer 'n' to contain
 *  the total number of defn structs filled.
 */
defn *readdict (FILE *fp, size_t *n)
{
    defn *defs = NULL;                     /* pointer to array of structs   */
    char buf[BUFSIZ] = "";                 /* buffer to hold each line read */
    size_t max = NDEF, haveword = 0, offset = 0;  /* allocated size & flags */

    /* allocate, initialize & validate memory to hold 'max' structs */
    if (!(defs = calloc (max, sizeof *defs))) {
        fprintf (stderr, "error: virtual memory exhausted.\n");
        return NULL;
    }

    while (fgets (buf, BUFSIZ, fp))     /* read each line of input */
    {
        if (*buf == '\n') {         /* check for blank line */
            if (haveword) (*n)++;   /* if word/def already read, increment n */
            haveword = 0;           /* reset haveword flag */
            if (*n == max) {
                void *tmp = NULL;   /* tmp ptr to realloc defs */
                if (!(tmp = realloc (defs, sizeof *defs * (max + NDEF)))) {
                    fprintf (stderr, "error: memory exhaused, realloc defs.\n");
                    break;
                }
                defs = tmp;     /* assign new block to defs */
                memset (defs + max, 0, NDEF * sizeof *defs); /* zero new mem */
                max += NDEF;    /* update max with current allocation size */
            }
            continue;           /* get next line */
        }

        if (haveword) {                 /* word already stored in defs[n].word */
            void *tmp = NULL;           /* tmp pointer to realloc */
            size_t dlen = strlen (buf); /* get line/buf length */
            if (buf[dlen - 1] == '\n')  /* trim '\n' from end */
                buf[--dlen] = 0;        /* realloc & validate */
            if (!(tmp = realloc (defs[*n].def, offset + dlen + 2))) {
                fprintf (stderr, 
                        "error: memory exhaused, realloc defs[%zu].def.\n", *n);
                break;
            }
            defs[*n].def = tmp;     /* assign new block, fill with definition */
            sprintf (defs[*n].def + offset, offset ? " %s" : "%s", buf);
            offset += dlen + 1;     /* update offset for rest (if required) */
        }
        else {                      /* no current word being defined */
            char *p = NULL;
            size_t i;
            for (i = 0; buf[i] && i < MAXW; i++) {   /* check first MAXW chars */
                if (buf[i] == '.') {         /* if a '.' is found, end of word */
                    size_t dlen = 0;
                    if (i + 1 == MAXW) {  /* check one char available for '\0' */
                        fprintf (stderr, 
                                 "error: 'word' exceeds MAXW, skipping.\n");
                        goto next;
                    }
                    strncpy (defs[*n].word, buf, i); /* copy i chars to .word  */
                    haveword = 1;                    /* set haveword flag      */
                    p = buf + i + 1;    /* set p to next char in buf after '.' */
                    while (*p && (*p == ' ' || *p < 'A' ||   /* find def start */
                        ('Z' < *p && *p < 'a') || 'z' < *p))
                        p++;                    /* increment p and check again */
                    if ((dlen = strlen (p))) {  /* get definition length */
                        if (p[dlen - 1] == '\n') /* trim trailing '\n' */
                            p[--dlen] = 0;
                        if (!(defs[*n].def = malloc (dlen + 1))) { /* allocate */
                            fprintf (stderr, 
                                     "error: virtual memory exhausted.\n");
                            goto done;            /* bail if allocation failed */
                        }
                        strcpy (defs[*n].def, p);   /* copy definition to .def */
                        offset = dlen;         /* set offset in .def buf to be */
                    }                          /* used if def continues on a   */
                    break;                     /* new or separae line */
                }               /* check word is all upper-case or a ' ' */ 
                else if (buf[i] != ' ' && (buf[i] < 'A' || 'Z' < buf[i]))
                    break;
            }
        }
        next:;
    }
    done:;

    if (haveword) (*n)++;   /* account for last word/definition */

    return defs;            /* return pointer to array of struct */
}

示例使用/输出

$ ./bin/dict_read <dat/dict.txt

word: ACHROMATIC

definition: An optical term applied to those telescopes in which 
aberration of the rays of light, and the colours dependent thereon, 
are partially corrected. (See APLANATIC.)

word: ACHRONICAL

definition: An ancient term, signifying the rising of the heavenly 
bodies at sunset, or setting at sunrise.

word: ACROSS THE TIDE

definition: A ship riding across tide, with the wind in the direction 
of the tide, would tend to leeward of her anchor; but with a weather tide, 
or that running against the wind, if the tide be strong, would tend to 
windward. A ship under sail should prefer the tack that stems the tide, 
with the wind across the stream, when the anchor is let go.

(手动插入换行符以使结果保持整洁)。

内存使用/错误检查

您还应该运行任何代码,通过内存使用和错误检查程序(如Linux上的valgrind)动态分配内存。只需运行代码并确认您释放所有内存并且没有内存错误,例如

$ valgrind ./bin/dict_read <dat/dict.txt
==31380== Memcheck, a memory error detector
==31380== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31380== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==31380== Command: ./bin/dict_read
==31380==

word: ACHROMATIC

<snip output>
==31380==
==31380== HEAP SUMMARY:
==31380==     in use at exit: 0 bytes in 0 blocks
==31380==   total heap usage: 4 allocs, 4 frees, 9,811 bytes allocated
==31380==
==31380== All heap blocks were freed -- no leaks are possible
==31380==
==31380== For counts of detected and suppressed errors, rerun with: -v
==31380== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

仔细看看,如果您有其他问题,请告诉我。