C ++程序在Linux上正确打开文件,但在Windows上没有

时间:2014-09-06 01:04:19

标签: c++ linux windows gcc mingw

我通过Mingw在Windows上编译了Linux程序,但输出错误。

错误说明:
该程序的输出在Windows上看起来与在Linux上不同。这就是它在Windows上的外观:

>tig_2
CAATCTTCAGAGTCCAGAGTGGGAGGCACAGACTACAGAAAATGAGCAGCGGGGCTGGTA
>cluster_1001_conTTGGTGAAGAGAATTTGGACATGGATGAAGGCTTGGGCTTGACCATGCGAAGG

预期输出:

>cluster_1001_contig2
CAATCTTCAGAGTCCAGAGTGGGAGGCACAGACTACAGAAAATGAGCAGCGGGGCTGGTA
>cluster_1001_contig1
TTGGTGAAGAGAATTTGGACATGGATGAAGGCTTGGGCTTGACCATGCGAAGG

(注意:输出非常大以将其粘贴到此处,因此上面的示例是伪实际的。)

可能的原因:
我观察到如果我将输入字符从Linux(LF)转换为Windows(CRLF),它几乎可以工作:文件中的第一个字符(>)丢失。相同的代码在Linux上完美运行,没有任何输入转换。因此,问题必须在解析输入的函数中,而不是写入输出的函数:

seq_db.Read( db_in.c_str(), options );

源代码:
这是解析输入文件的部分。无论如何,我可能是错的。故障可能在其他地方。如果需要,完整源代码为here:)

void SequenceDB::Read( const char *file, const Options & options )
{
    Sequence one;
    Sequence dummy;
    Sequence des;
    Sequence *last = NULL;
    FILE *swap = NULL;
    FILE *fin = fopen( file, "r" );
    char *buffer = NULL;
    char *res = NULL;
    size_t swap_size = 0;
    int option_l = options.min_length;
    if( fin == NULL ) bomb_error( "Failed to open the database file" );
    if( options.store_disk ) swap = OpenTempFile( temp_dir );
    Clear();
    dummy.swap = swap;
    buffer = new char[ MAX_LINE_SIZE+1 ];

    while (not feof( fin ) || one.size) { /* do not break when the last sequence is not handled */
        buffer[0] = '>';
        if ( (res=fgets( buffer, MAX_LINE_SIZE, fin )) == NULL && one.size == 0) break;
        if( buffer[0] == '+' ){
            int len = strlen( buffer );
            int len2 = len;
            while( len2 && buffer[len2-1] != '\n' ){
                if ( (res=fgets( buffer, MAX_LINE_SIZE, fin )) == NULL ) break;
                len2 = strlen( buffer );
                len += len2;
            }
            one.des_length2 = len;
            dummy.des_length2 = len;
            fseek( fin, one.size, SEEK_CUR );
        }else if (buffer[0] == '>' || buffer[0] == '@' || (res==NULL && one.size)) {
            if ( one.size ) { // write previous record
                one.dat_length = dummy.dat_length = one.size;
                if( one.identifier == NULL || one.Format() ){
                    printf( "Warning: from file \"%s\",\n", file );
                    printf( "Discarding invalid sequence or sequence without identifier and description!\n\n" );
                    if( one.identifier ) printf( "%s\n", one.identifier );
                    printf( "%s\n", one.data );
                    one.size = 0;
                }
                one.index = dummy.index = sequences.size();
                if( one.size > option_l ) {
                    if ( swap ) {
                        swap_size += one.size;
                        // so that size of file < MAX_BIN_SWAP about 2GB
                        if ( swap_size >= MAX_BIN_SWAP) {
                            dummy.swap = swap = OpenTempFile( temp_dir );
                            swap_size = one.size;
                        }
                        dummy.size = one.size;
                        dummy.offset = ftell( swap );
                        dummy.des_length = one.des_length;
                        sequences.Append( new Sequence( dummy ) ); 
                        one.ConvertBases();
                        fwrite( one.data, 1, one.size, swap );
                    }else{
                        //printf( "==================\n" );
                        sequences.Append( new Sequence( one ) ); 
                        //printf( "------------------\n" );
                        //if( sequences.size() > 10 ) break;
                    }
                    //if( sequences.size() >= 10000 ) break;
                }
            }
            one.size = 0;
            one.des_length2 = 0;

            int len = strlen( buffer );
            int len2 = len;
            des.size = 0;
            des += buffer;
            while( len2 && buffer[len2-1] != '\n' ){
                if ( (res=fgets( buffer, MAX_LINE_SIZE, fin )) == NULL ) break;
                des += buffer;
                len2 = strlen( buffer );
                len += len2;
            }
            size_t offset = ftell( fin );
            one.des_begin = dummy.des_begin = offset - len;
            one.des_length = dummy.des_length = len;

            int i = 0;
            if( des.data[i] == '>' || des.data[i] == '@' || des.data[i] == '+' ) i += 1;
            if( des.data[i] == ' ' or des.data[i] == '\t' ) i += 1;
            if( options.des_len and options.des_len < des.size ) des.size = options.des_len;
                  while( i < des.size and ( des.data[i] != '\n') ) i += 1;
            des.data[i] = 0;
            one.identifier = dummy.identifier = des.data;
        } else {
            one += buffer;
        }
    }
#if 0
    int i, n = 0;
    for(i=0; i<sequences.size(); i++) n += sequences[i].bufsize + 4;
    cout<<n<<"\t"<<sequences.capacity() * sizeof(Sequence)<<endl;
    int i;
    scanf( "%i", & i );
#endif
    one.identifier = dummy.identifier = NULL;
    delete[] buffer;
    fclose( fin );
}

输入文件的格式如下:

> comment
ACGTACGTACGTACGTACGTACGTACGTACGT
> comment
ACGTACGTACGTACGTACGTACGTACGTACGT
> comment
ACGTACGTACGTACGTACGTACGTACGTACGT
etc

1 个答案:

答案 0 :(得分:1)

问题很可能是您需要在调用"rb"时使用fopen开关打开文件。 "rb"以二进制模式打开文件,而不是"r",它会在&#34; text&#34;中打开文件。模式。

由于您在Linux和Windows之间来回切换,因此行尾字符会有所不同。如果您将文件打开为&#34; text&#34;在Windows中,但该文件是针对Linux格式化的,您向Windows说谎它是一个文本文件。所以运行时将CR / LF转换完全错误。

因此,您应该将文件打开为二进制文件"rb",以便CR / LF翻译不会完成。