我有职位列表:
rpivotTable::rpivotTable(df, rows = c("Country Region", "Function", "SB"), cols = c("+1 Country Region", "+1 Function", "+1 SB"))
并希望能够在其核苷酸序列中转换这些位置,从而提供自定义的fasta文件。如:
chr1 1000
chr2 2000
chr3 4000
python中是否有任何已编写的工具可以完成这项工作?
答案 0 :(得分:3)
鉴于FASTA文件chromosomes.fasta
:
>chr1
GATTACA
>chr2
ATTACGA
>chr3
GCCAACG
位置文件positions.txt
:
chr1 3
chr2 4
chr3 5
您可以使用以下代码:
from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('chromosomes.fasta', "fasta"))
chromosome_positions = {}
with open('positions.txt') as f:
for line in f.read().splitlines():
if line:
chromosome, position = line.split()
chromosome_positions[chromosome] = int(position)
for chromosome in chromosome_positions:
seq = record_dict[chromosome]
position = chromosome_positions[chromosome]
base = seq[position]
print chromosome, position, base
将输出:
chr3 5 C
chr2 4 C
chr1 3 T
请注意,Python使用zero-based indexing,因此5
中的positions.txt
位置将为您提供相应序列中的第六个基础。