如何使用posix getline来读取任意长的文本块

时间:2015-03-30 11:14:53

标签: c posix

我想从文件中读取任意长字符串。我想按行读取它,但获取指向包含完整输入的字符串的指针。显然,我想要为总长度实现某个可配置的限制(每次读取下一行之前都可以检查)。 我想使用POSIX函数并开始实现基于getline()的简单操作,从http://crasseux.com/books/ctutorial/getline.html开始。 我可以使用getline来实现这一点,例如在while循环中执行它并将指针传递给先前读取的字符串的末尾吗?我如何释放动态分配的内存?

以下代码有效,但只读取行。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  ssize_t bytes_read;
  size_t nbytes = 100;
  char *my_string;
  FILE *input;

  input = fopen("tmp.txt", "r");

  my_string = (char *)malloc(nbytes+1);

  while((bytes_read=getline(&my_string, &nbytes, input))>=0){
        printf("read: %ld bytes", bytes_read);
        printf("length of string: %ld bytes", strlen(my_string)-1);
        puts(my_string);
  }

  free(my_string);

  fclose(input);

  return 0;
}

2 个答案:

答案 0 :(得分:0)

根据文件(Linux Programmer's Manual):

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

“如果* lineptr设置为NULL并且* n在调用之前设置为0,那么getline()将分配一个缓冲区来存储该行。即使getline()失败,该缓冲区也应由用户程序释放。 “

您无需为单行分配缓冲区。

要读取所有行,请分配一个指向字符的指针数组。

char* lines[200];

int MAX_LINES = 10000;
char** lines;
lines = malloc(MAX_LINES * sizeof(char*));
int i = 0;
long totalsize = 0; 
long bytesread;

do {
    lines[i] = NULL;
    bytesread = getline(&lines[i], (size_t) 0 , yourFileHandle);
    totalsize += bytesread;
    ++i;
} while (bytesread > 0);

// Allocate buffer of size totalsize
// Copy the lines into it
// Deallocate all lines pointers

代码未编译或测试。刚刚给了你这个主意。

答案 1 :(得分:0)

您可以通过类似于GNU getline()的函数将文件中的数据读入动态大小的缓冲区中,并引入一些缓冲区管理。例如:

/*
 * Reads up to the specified number of chars from the specified stream,
 * appending them to the end of the data in the specified buffer.
 * Allocates a new buffer if the given one is NULL, an reallocates the
 * buffer if more space is needed than its current capacity.  The buffer
 * is ensured null-terminated.  Returns the number of new chars actually
 * read, excluding the terminator, or -1 on failure to read any data.
 *
 * buffer: a pointer to the location where the address of the target buffer
 *         is stored; NULL if no buffer has yet been allocated; the
 *         caller is repsonsible for freeing the pointed-to buffer.
 * buf_cap: a pointer to the capacity of the allocated buffer; updated at
 *         need.  The value initially stored here is ignored when *buffer
 *         is NULL.
 * buf_len: a pointer to the number of valid chars currently in the buffer,
 *         not including any null terminator; new data will be stored
 *         starting at the next position.  The pointed-to value is updated
 *         when chars are successfully read.
 * num_chars: the number of characters requested to be read; the actual
 *         number read may be fewer.
 * stream: the stream from which to read.
 */
ssize_t extend_line(char **buffer, size_t *buf_cap, size_t *buf_len,
        size_t num_chars, FILE *stream) {
    size_t n_read;

    if (!*buffer) { /* No buffer allocated yet */
        if (! (*buffer = malloc(num_chars + 1))) {
            /* allocation failure */
            return -1;
        }
        *buf_cap = num_chars;
        *buf_len = 0;
    } else if (*buf_len > *buf_cap) {
        /* invalid arguments */
        return -1;
    } else if (*buf_cap - *buf_len <= num_chars) {
        /* extend the buffer */
        size_t needed_cap = *buf_len + num_chars + 1;
        char *temp = realloc(*buffer, needed_cap);

        if (temp) {
            *buffer = temp;
            *buf_cap = needed_cap;
        } else {
            /* reallocation failure */
            return -1;
        }
    }

    /* There is now enough space for at least num_chars additional chars */
    n_read = fread(*buffer + *buf_len, 1, num_chars, stream);
    if (n_read) {
        /* update the data length and ensure null termination */
        *buf_len += n_read;
        (*buffer)[*buf_len] = '\0';
        return n_read;
    } else {
        return -1;
    }
}

固定大小的读取请求(如在此处实现)相对容易,但您可以使用类似的方法来读取行尾,如getline()所做的那样。在这种情况下,每次调用可能需要多次realloc(),但我会建议您以块的形式扩展缓冲区,而不是一次扩展一个字节。