Question

我获得了一个DNA序列文件，并要求将所有序列相互比较，并删除不唯一的序列。我正在使用的文件是fasta格式，因此奇数行是标题，偶数行是我想要比较的序列。所以我试图将偶数行存储在一个数组中，将奇数行存储在另一个数组中。我对C很新，所以我不知道从哪里开始。我想出了如何将整个文件存储在一个数组中：

int main(){      
    int total_seq = 50;
    char seq[100];
    char line[total_seq][100];

    FILE *dna_file;
    dna_file = fopen("inabc.fasta", "r");

    if (dna_file==NULL){
       printf("Error");
       }
    while(fgets(seq, sizeof seq, dna_file)){
       strcpy(line[i], seq);
       printf("%s", seq);
       i++;
      }
     }

    fclose(dna_file);


    return 0;
    }

我在想我必须加入某种类似的代码：

for (i = 0; i < rows; i++){

    if (i % 2 == 0) header[i/2] = getline();
    else seq[i/2] = getline();

但我不确定如何实现它。

任何帮助将不胜感激！

Answer 1

你能给我一个文件中数据的例子吗？

我是否正确地认为它会像：

标题
序列
头
序列

等等

也许你可以这样做：

int main(){      
int total_seq = 50;
char seq[100];
char line[total_seq][100];

FILE *dna_file;
dna_file = fopen("inabc.fasta", "r");

if (dna_file==NULL){
   printf("Error");
   }

// Put this in an else statement
int counter = 1;
while(fgets(seq, sizeof seq, dna_file)){
   // If counter is odd
      // Place next line read in headers array
   // If counter is even
      // Place next line read in sequence array
   // Increment counter
 }
// Now you have all the sequences & headers. Remove any duplicates
// Foreach number of elements in 'sequence' array - referenced by, e.g. 'j' where 'j' starts at 0
   // Foreach number of elements in 'sequence' array - referenced by 'k' - Where 'k' Starts at 'j + 1'
      // IF (sequence[j] != '~') So if its not our chosen escape character
         // IF (sequence[j] == sequence[k]) (I think you'd have to use strcmp for this?)
            // SET sequence[k] = '~';
            // SET header[k] = '~';
         // END IF
      // END IF
   // END FOR
// END FOR
}

// You'd then need an algorithm to run through the arrays. If a '~' is found. Move the following non tilda/sequence down to its position, and so on.

// EDIT: Infact. It would probably be easier if when writing back to file, just ignore/don't write if sequence[x] == '~' (where 'x' iterates through all)

// Finally write back to file

fclose(dna_file);

return 0;
}

Answer 2

将文件的偶数行存储到一个数组，将奇数行存储到另一个数组，
在遇到char时读取每个'\n'并交换输出文件。

void Split(FILE *even, FILE* odd, FILE *source) {
  int evenflag = 1;
  int ch;
  while ((ch = fgetc(source)) != EOF) {
    if (evenflag) {
      fputc(ch, even);
    } else {
      fputc(ch, odd);
    }
    if (ch == '\n') {
      evenflag = !evenflag;
    }
  }
}

目前尚不清楚这篇文章是否还需要代码来执行独特的过滤步骤。

Answer 3

首先：编写一个计算文件中换行符（\n）个数的函数。然后编写一个搜索第n个换行符的函数最后，编写一个函数来浏览并阅读一个＆＃39; \ n＆＃39;到下一个。

或者，您可以上网阅读字符串解析。

如何将文件的偶数行存储到一个数组，将奇数行存储到另一个数组

3 个答案: