Question

我创建了一个代码，它将使用C将.txt文件解析为双数组。我的.txt文件被格式化，以便每个点由＆＃34;，＆＃34; <分隔。 / em>的。现在我想让这段代码解析相同的数据，但是来自.csv文件。当我更改文件类型时，我收到分段错误。

为什么会这样？我是否错误地认为这两种文件类型将以相同的方式阅读？

这篇文章的主要问题是，阅读.txt和.csv有什么区别？

/* * Calibration File Read Test */ #include <stdio.h> #include <string.h> #include <stdlib.h> int main () { FILE *myfile = fopen ( "BarEast.txt", "r" ); /* I want to change this file type to .csv */ /* opening file for reading */ if(myfile == NULL) { printf("Error opening file"); return(-1); } int i = 0; int j, k; char *result[361] = {0}; char line[10]; char *value; while(fgets(line, sizeof(line), myfile)) { value = strtok(line, ","); result[i] = malloc(strlen(value) + 1); strcpy(result[i], value); i++; } double val; double cal[361] = {0}; for(k = 0; k < 361; k++) { val = atof(result[k]); cal[k] = val; } for(j = 0; j < 361; j++) { printf("Element[%d] = %f\n", j, cal[j]); } fclose(myfile); return 0; }

Answer 1

问题不在于文件的名称，而是文件具有不同的内容。不同的内容暴露了代码中的内存问题。

我的眼睛立刻转到硬编码的361处。假设输入文件中有361行，并且存在段错误。当val = atof(result[k]);走出result数组时，它发生在第40行（使用valgrind标识）。 C中的硬件代码大小非常诱人。不要这样做，特别是对于输入，这是一个你不能依赖的拐杖。

相反，代码必须适应文件中的字段和行数。您可以使用realloc编写自己的动态数组代码，但是有很多C库可以为您执行此操作，并且更好。我找到GLib的基础知识。

另一个问题是你每行只分配了10个字节。这非常小。这意味着fgets如果长度超过9个字符（它将会是），line会不断走开fgets。从输入读取时的任何类型的静态内存分配都是一个问题。使用getline而不是getline可以避免每行分配多少内存的问题。 getline为您解决此问题。请注意，line重复使用line，因此，如果您要更改strdup，则需要先/* * Calibration File Read Test */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <glib.h> int main (int argc, char **argv) { /* Check we got the right number of arguments. */ if( argc != 2 ) { fprintf(stderr, "Usage: %s <filename>\n", argv[0]); return -1; } /* Open the file */ FILE *fp = fopen ( argv[1], "r" ); if(fp == NULL) { fprintf(stderr, "Error opening file %s for reading.\n", argv[1]); return(-1); } /* A dynamic array which will grow as needed */ GArray *result = g_array_new(TRUE, TRUE, sizeof(char *)); /* Read each line using getline which does the line memory allocation for you. No buffer overflow to worry about. */ char *line = NULL; size_t linecap = 0; while(getline(&line, &linecap, fp) > 0) { /* This will only read the first cell. Exercise left for the reader. */ char *value = strtok(line, ","); if( value == NULL ) { fprintf(stderr, "Could not parse %s\n", line); continue; } char *field = malloc(strlen(value) + 1); strcpy(field, value); g_array_append_val(result, field); } free(line); fclose(fp); /* Iterate through the array using result->len to know the length */ for(int i = 0; i < result->len; i++) { printf("Element[%d] = %s\n", i, g_array_index(result, char *, i)); } /* Free the array */ g_array_free(result, TRUE); return 0; }。

atof

我已经取消了df = pd.DataFrame({'a':[1,2,2,4], 'b':[1,1,1,1]}) df.join(pd.Series(df.groupby(by='a').apply(lambda x: list(x.b)), name="list_of_b"), on='a') a b list_of_b 0 1 1 [1] 1 2 1 [1, 1] 2 2 1 [1, 1] 3 4 1 [1]转换，因为它分散了主要问题。如果你愿意，你可以把它放回去。

这仍然存在只读取每行的第一个单元格的问题。我告诉你这件事。

Answer 2

您在此代码中的转化次数

for(k = 0; k < 361; k++)
{
    val = atof(result[k]);
    cal[k] = val;
}

超出了数组'结果'的范围当您要将数据放入其中时，只需将内存分配给结果数组中的元素

result[i] = malloc(strlen(value) + 1);

如果创建的记录少于361条，则表示您正在读取未分配的内存 - 因此错误。

您需要记录已读取的结果数，然后使用该值确保在处理结果数组时保持在范围内。

基于文件扩展名的文件之间没有区别。

.txt vs .csv解析C

2 个答案: