我实际上有两个fasta文件candidate_aa_0042.fasta和candidates_aa_0035.fasta
和两个数据帧Best_blast_candidate_hit_0042.csv和Best_blast_candidate_hit_0035.csv
以下是它们的例子:
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore salltitles staxids scientific_name scomnames sskingdoms Order
g44459.t1_0035_0035 XP_011687429.1 39.5 157 95 0 7 163 2 158 8.1e-27 129.8 uncharacterized protein LOC105449744 [Wasmannia auropunctata] 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g17612.t1_0035_0042 XP_011699787.1 59.3 349 142 0 99 447 336 684 1.5e-120 442.6 uncharacterized protein LOC105457055 [Wasmannia auropunctata] 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g29924.t1_0035_0042 XP_011871948.1 67.0 261 85 1 1 260 18 278 1.3e-100 375.6 uncharacterized protein LOC105564266, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g47960.t1_0035_0035 XP_011860868.1 68.8 298 93 0 1 298 142 439 3.3e-116 427.6 uncharacterized protein LOC105558006 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g28580.t1_0035_0042 XP_011883624.1 70.0 240 69 3 1 239 41 278 1.3e-86 328.9 uncharacterized protein LOC105570787 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
和
qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore salltitles staxids scientific_name scomnames sskingdoms Order
g34354.t1_0042_0035 XP_011699801.1 43.7 135 63 4 7 128 625 759 9.3e-17 96.3 LOW QUALITY PROTEIN 64793 Wasmannia auropunctata Eukaryota Hymenoptera
g34606.t1_0042_0035 XP_011871948.1 59.8 249 79 2 1 228 51 299 3.4e-81 310.8 uncharacterized protein LOC105564266, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g13215.t1_0042_0042 XP_011883625.1 62.0 242 92 0 46 287 160 401 5.4e-82 313.9 uncharacterized protein LOC105570788, partial [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g35379.t1_0042_0035 XP_011858260.1 73.3 191 51 0 4 194 690 880 6.3e-76 293.1 uncharacterized protein LOC105555830 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
g13770.t1_0042_0042 XP_011883624.1 66.5 203 65 3 10 211 33 233 1.9e-65 258.5 uncharacterized protein LOC105570787 [Vollenhovia emeryi] 411798 Vollenhovia emeryi Eukaryota Hymenoptera
我实际上必须以与fasta文件中的seqID相同的顺序合并它们。但
例如,如果fasta文件1包含:
>seq1_0035_0042
ATGGAGAGATAG
>seq6_0035_0035
ATGGATAGAGA
并且fasta文件2包含:
>seq8_0042_0042
ATGGAGAGATAG
>seq3_0042_0035
ATGGATAGAGA
然后我想按顺序合并我的数据框:
例如:
qseqid_1 qseqid_2 sseqid_1 sseqid_2 pident_1 pident_2 etc...
seq1_0035_0042 XP_011883678.1 seq8_0042_0042 XP_011883789.1 78.9 45.9 etc
seq6_0035_0035 XP_011566754.1 seq3_0042_0035 XP_011566754.1 67.9 78.0. etc
Ps:数据帧中不存在fasta文件中的所有SeqId,因此如果没有一对,我们可以在数据帧中添加它并在第二列添加一个Nan吗? 谢谢你的帮助:))