使用JavaScript将核苷酸转换为氨基酸

时间:2017-05-01 19:44:03

标签: javascript dna-sequence genetics

我正在创建一个Chrome扩展程序,将一串长度为 nlen 的核苷酸转换为相应的氨基酸。

我之前在Python中做过类似的事情但是我仍然对JavaScript很陌生我很难将同样的逻辑从Python翻译成JavaScript。我到目前为止的代码如下:

function translateInput(n_seq) {
  // code to translate goes here

  // length of input nucleotide sequence
  var nlen = n_seq.length

  // declare initially empty amino acids string
  var aa_seq = ""

  // iterate over each chunk of three characters/nucleotides
  // to match it with the correct codon
  for (var i = 0; i < nlen; i++) {




      aa_seq.concat(codon)
  }

  // return final string of amino acids   
  return aa_seq
}

我知道我想一次迭代三个字符,将它们与正确的氨基酸匹配,然后将氨基酸连续连接到氨基酸的输出字符串(aa_seq),一旦循环就返回该字符串完整。

我还尝试将密码子的a dictionary创建为氨基酸关系,并且想知道是否有办法使用类似的东西作为将三个字符密码子与其各自的氨基酸匹配的工具:

codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

编辑: 输入核苷酸串的一个例子是&#34; AAGCATAGAAATCGAGGG&#34;,具有相应的输出串&#34; KHRNRG&#34;。希望这有帮助!

4 个答案:

答案 0 :(得分:3)

<强>意见

我个人建议的第一件事就是建立一个从3-char密码子到氨基酸的字典。这将允许您的程序获取几个密码子串链并将它们转换为氨基串,而不必每次都进行昂贵的深度查找。字典将起到这样的作用

codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc

从那里,我实现了两个实用功能:slideslideStr。这些并不是特别重要,所以我只用一些输入和输出的例子来介绍它们。

slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]

slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]

slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']

slideStr (2,2) ('abcd')
// ['ab', 'cd']

使用反向字典和通用实用程序功能,编写codon2amino是轻而易举的

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

Runnable演示

为了澄清,我们基于codonDict 构建aminoDict一次,并将其重复用于每个密码子到氨基的计算。

&#13;
&#13;
// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };

// codon dictionary derived from aminoDict
const codonDict =
 Object.keys(aminoDict).reduce((dict, a) =>
   Object.assign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})

// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
  if (n > xs.length)
    return []
  else
    return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}

// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
  slide(n,m) (Array.from(str)) .map(s => s.join(''))

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG
&#13;
&#13;
&#13;

进一步说明

  

你能澄清一些这些变量应该代表什么吗? (n,m,xs,c等)

我们的slide函数为我们提供了一个数组的滑动窗口。它期望窗口的两个参数 - n窗口大小和m步长 - 以及一个参数,它是要迭代的项目数组 - xs,可以读取作为x,或复数x,如x项的集合

slide有目的地通用,因为它可以在任何可迭代 xs上运行。这意味着它可以使用Array,String或其他任何实现Symbol.iterator的东西。这也是为什么我们使用像xs这样的通用名称的原因,因为将它命名为特定类似的东西让我们认为它只能用于特定类型

c中的变量.map(c => codonDict[c])等其他内容并不是特别重要 - 我将其命名为c 密码,但我们可以命名它xfoo,它并不重要。 &#34;技巧&#34;理解c是为了理解.map

[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]

所以我们在这里真正做的就是采用数组([1 2 3 4 5])并创建一个新数组,我们为原始数组中的每个元素调用f

现在,当我们查看.map(c => codonDict[c])时,我们了解我们所做的就是在c中为每个元素查找codonDict

const codon2amino = str =>
  slideStr(3,3)(str)          // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
    .map(c => codonDict[c])   // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
    .join('')                 // 'KHRN...'
  

此外,这些&#39; const&#39;能够基本上替换原始translateInput()功能的项目?

如果您不熟悉ES6(ES2015),上面使用的某些语法可能对您来说很陌生。

// foo using traditional function syntax
function foo (x) { return x + 1 }

// foo as an arrow function
const foo = x => x + 1

简而言之,是的,codon2aminotranslateInput的确切替代品,只是使用const绑定和箭头函数定义的。我选择codon2amino作为名称,因为它更好地描述了函数的操作 - translateInput没有说明它的翻译方式(A到B,或B到A?) ,&#34;输入&#34; 在这里是一种无意义的描述符,因为所有函数都可以接受输入。

您之后看到其他const声明的原因是因为我们将您的功能工作分成多个功能。造成这种情况的原因大多超出了这个答案的范围,但简短的解释是,承担多项任务责任的一个专门职能对我们来说没有比可以合理方式组合/重用的多个通用函数更有用。

当然,codon2amino需要查看输入字符串中的每个3个字母的序列,但这并不意味着我们必须在codon2amino函数内编写字符串分割代码。我们可以编写一个通用字符串拆分函数,就像我们使用slideStr一样,这对于任何希望通过字符串序列进行迭代然后让我们的codon2amino函数使用它的函数很有用 - 如果我们封装了所有字符串的话在codon2amino内部分割代码,下次我们需要通过字符串序列进行迭代时,我们必须复制该部分代码。

所有这一切......

  

有什么方法可以保留我原来的循环结构吗?

我真的认为你应该花一些时间来通过上面的代码来看看它是如何工作的。如果您还没有看到以这种方式分离的程序问题,那么可以在那里学到很多宝贵的经验教训。

当然,这不是解决问题的唯一方法。我们可以使用原始for循环。对我而言,考虑创建迭代器i并手动递增i++i += 3会更加精神开销,确保检查i < str.length,重新分配返回值result += something等 - 添加更多变量,你的大脑很快变成汤。

&#13;
&#13;
function makeCodonDict (aminoDict) {
  let result = {}
  for (let k of Object.keys(aminoDict))
    for (let a of aminoDict[k])
      result[a] = k
  return result
}

function translateInput (dict, str) {
  let result = ''
  for (let i = 0; i < str.length; i += 3)
    result += dict[str.substr(i,3)]
  return result
}

const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)

const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG
&#13;
&#13;
&#13;

答案 1 :(得分:1)

另外,您可以用紧凑的形式写出上述答案(@ guest271314):

var res = ''
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

您可以在下面看到完整的答案。

&#13;
&#13;
const codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

const str = "AAGCATAGAAATCGAGGG";

let res = "";
// just rewrite the above code into the short answer
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

console.log(res);
&#13;
&#13;
&#13;

答案 2 :(得分:0)

Mh,我建议首先更改字典的形状-这种方式不是很有用,所以让我们这样做:

const dict = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
}
const codons = Object.keys(dict).reduce((a, b) => {dict[b].forEach(v => a[v] = b); return a}, {})

//In practice, you will get:

const codons = { GCA: 'A',
  GCC: 'A',
  GCG: 'A',
  GCT: 'A',
  TGC: 'C',
  TGT: 'C',
  GAC: 'D',
  GAT: 'D',
  GAA: 'E',
  GAG: 'E',
  TTC: 'F',
  TTT: 'F',
  GGA: 'G',
  GGC: 'G',
  GGG: 'G',
  GGT: 'G',
  CAC: 'H',
  CAT: 'H',
  ATA: 'I',
  ATC: 'I',
  ATT: 'I',
  AAA: 'K',
  AAG: 'K',
  CTA: 'L',
  CTC: 'L',
  CTG: 'L',
  CTT: 'L',
  TTA: 'L',
  TTG: 'L',
  ATG: 'M',
  AAC: 'N',
  AAT: 'N',
  CCA: 'P',
  CCC: 'P',
  CCG: 'P',
  CCT: 'P',
  CAA: 'Q',
  CAG: 'Q',
  AGA: 'R',
  AGG: 'R',
  CGA: 'R',
  CGC: 'R',
  CGG: 'R',
  CGT: 'R',
  AGC: 'S',
  AGT: 'S',
  TCA: 'S',
  TCC: 'S',
  TCG: 'S',
  TCT: 'S',
  ACA: 'T',
  ACC: 'T',
  ACG: 'T',
  ACT: 'T',
  GTA: 'V',
  GTC: 'V',
  GTG: 'V',
  GTT: 'V',
  TGG: 'W',
  TAC: 'Y',
  TAT: 'Y' }

//Now we are reasoning!

//From here on, it is pretty straightforward:

const rnaParser = s => s.match(/.{3}/g).map(fragment => codons[fragment]).join("")

答案 3 :(得分:-1)

您可以使用for循环,String.prototype.slice()从字符串for..of循环Object.entries()的开头一次迭代字符串三个字符,以迭代{{1}的属性和值1}} object,codon_dictionary将输入字符串的当前三个字符部分匹配到值为Array.prototype.includes() object的数组,将属性连接到字符串变量。

&#13;
&#13;
codon_dictionary
&#13;
&#13;
&#13;