使用C解析在单列中具有逗号分隔值的csv文件

时间:2015-09-04 01:29:17

标签: c csv

我必须从CSV文件中读取数据并在我的C函数中使用这些值。

我使用以下代码执行此操作:

int arg1;
char arg2[500];
int arg3;
FILE *file;
file=fopen(filename,"r");
if (file == NULL)
{
    printf("Not able to open the file\n");
}
while (fgets(line,1000, file)!=NULL)
    {
            splitline=strtok(line,",");
            while(splitline)
            {
                if(firstargument==1)
                {
                    arg1=atoi(splitline);
                    printf("First argument is %d ",arg1);
                    firstargument=2;
                }
                else if(firstargument==2)
                {
                    splitline[strlen(splitline)]='\0';
                    strcpy(arg2,splitline);
                    printf("Second argument is %s\n",arg2);
                    firstargument=3;
                }
                else
                {
                    arg3=atoi(splitline);
                    printf("Third argument is %d ",arg1);
                    firstargument=1;
                }
                splitline = strtok(NULL,",");
            }
            printf("Value to insert Key:%d,Value:%s,Height:%d\n",arg1,arg2,arg3);
            inserth(inode,arg1,arg2,arg3);
        }

但是当我的csv文件的单个列包含多个逗号分隔值时 解析失败

350206,Uma,1
350207,Umika,1
350208,"Vaishavi, Vaishnodevi",1
350226,Badriprasad,1
350227,"Kanak, Kanaka",1

有没有办法在单列中读取csv文件中的多个值?

3 个答案:

答案 0 :(得分:2)

试试这个

if (file == NULL){
    perror("Not able to open the file\n");
    exit(EXIT_FAILURE);
}
while (fgets(line,1000, file)!=NULL){
    if( 3==sscanf(line, "%d,\"%499[^\"]\",%d", &arg1, arg2, &arg3) || //pattern 1
        3==sscanf(line, "%d,%499[^,],%d",      &arg1, arg2, &arg3)){  //pattern 2
        printf("Value to insert Key:%d,Value:%s,Height:%d\n",arg1,arg2,arg3);
        inserth(inode,arg1,arg2,arg3);
    } else {
        fprintf(stderr, "invalid format\n");
        exit(EXIT_FAILURE);
    }
}

答案 1 :(得分:2)

一种解决方案是实现尊重双引号的strtokstrtok本身很容易实现:

char * tokenise( char *str, const char *delim )
{
    static char *next = NULL;
    if( str ) next = str;
    if( !*next ) return NULL;
    str = next;
    while( !strchr( delim, *next ) ) next++;
    if( *next ) *next++ = 0;
    return str;
}

现在,这是一般情况。你只关心逗号,而在双引号的情况下,你只关心双引号:

char * tokenise( char *str )
{
    static char *next = NULL;
    if( str ) next = str;
    if( !*next ) return NULL;
    str = next;

    if( *str == '"' ) {
       str++;
       next++;
       while( *next && *next != '"' ) next++;
       if( *next == '"' ) *next++ = 0;
    }

    while( *next && *next != ',' ) next++;
    if( *next ) *next++ = 0;
    return str;
}

这很天真,但应该这样做。它会检测第一个字符中的",将其删除,然后扫描到下一个"。它不会处理转义的引号,CSV字段之间的空白或语法错误(例如在结束引号之后出现的非逗号字符 - 它会丢弃它们)但你明白了。

答案 2 :(得分:0)

首先,对于csv文件,如果逗号包含在双引号内,则通常意味着应忽略逗号。这意味着每行只有3个值(3列)。

由于您只有两种类型的行,一种是双引号,另一种是双引号,因为简单的if-else语句可以帮助

我使用strsep代替strtok。我也是逐行读取文件。

#include <string.h>
#include <stdio.h>

int main()

    {
    FILE *file;
    char *splitline;
    char * line = NULL;
    size_t len = 0;
    ssize_t read;
    file=fopen(filename,"r");
    if (file == NULL)
    {
    printf("Not able to open the file\n");
    }
     while ((read = getline(&line, &len, file)) != -1) 
        {
            if(strstr(line,"\"") != NULL)

            {
            printf("%s\n",line);
                    splitline=strsep(&line,",");
                printf("%s : %s\n",splitline,line);
                    line = line+1;  // avoiding first doublequotes("), you may use strsep twice instead
                    splitline = strsep(&line,"\"");
                printf("%s : %s\n",splitline,line);
                    line = line+1;   // removing comma and rest is the 3rd entry
                printf("%s",line);
                     }

                   else
                  {


                   //Routine code that expects two commas and three values

                   }

          }
    }