如果替代核苷酸导致错义突变

时间:2016-04-13 15:17:11

标签: r sequence fasta

我尝试比较有和没有SNP的大序列数据,并将snps标记为非同义或同义。我有来自PLNIK的.fasta序列和.bim文件,保守(参考)和替代核苷酸。:

head(test)

  pos ALT REF
1   2   G   T
2   8   G   T
3  65   C   G
4  68   C   G
5  77   T   C
6  78   G   C

我可以用替代核苷酸取代参考核苷酸:

ref[test$pos]=as.vector(test$ALT)

我需要说,替代会导致氨基酸的变化与否。我想使用seqinr包裹,也许我的方法不对? 所以我有2个字符串,它们是序列(alt向量中的替代核苷酸用上部寄存器标记):

ref=c("a","t","g","t","c","g","t","c","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","g","c","g","c","c","g","g","t",
"g","g","c","c","g","t","g","c","g","g","g","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","c","c","c","t","c","g","t","c",
"c","g","t","g","a","c","a","t","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","c","c",
"g","t","t","a","a","g")

alt=c("a","G","g","t","c","g","t","G","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","C","c","g","C","c","g","g","t",
"g","g","c","c","T","G","g","c","g","g","C","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","C","c","c","t","c","g","C","c",
"c","T","t","g","a","c","a","T","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","C","c",
"g","t","t","a","a","g")

我可以将这些载体翻译成氨基酸:

t_ref=translate(ref)
t_alt=translate(alt)

然后我可以比较它们并说出哪些改变了:

which((ref==alt)==FALSE)
which((t_ref==t_alt)==FALSE)

所以问题是在test df中标记核苷酸会导致氨基酸改变。提前谢谢。

1 个答案:

答案 0 :(得分:2)

使用模运算从核苷酸序列的pos列构建蛋白质序列中的位置

library(seqinr)
test$pos %/% 3  # returns a zero-based position, so add 1 to get 1 based value
#[1]  0  2 21 22 25 26
t_ref[ 1+(test$pos %/% 3)]
#[1] "M" "S" "G" "A" "R" "A"  # lookup value in prot-seq
t_alt[ 1+(test$pos %/% 3)]
#[1] "R" "W" "A" "A" "L" "A"  # test for equality to this value
test$change  <- t_ref[ 1+((test$pos-1) %/% 3)] == t_alt[ 1+((test$pos-1) %/% 3)]
test
 #=====================
  pos ALT REF change
1   2   G   T  FALSE
2   8   G   T  FALSE
3  65   C   G  FALSE
4  68   C   G   TRUE
5  77   T   C  FALSE
6  78   G   C  FALSE

我得到了#34;注册&#34;在我的第一次尝试中模数算术错了,注意到这是一个正确的&#34;注册&#34;翻译:

> (1:21 -1) %/% 3
 [1] 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6