AT内容计算器中的重新匹配无法正常工作

时间:2015-05-28 14:22:58

标签: python bioinformatics

在此代码中解决了一些问题,由于某种原因,我的方法验证检查工作不正常。我想要做的就是验证来自用户的输入是否只包含字母G,C,A,T,然后再转到at_calculate方法,该方法对输入序列执行数学运算。任何帮助/提示将不胜感激。

import re

from tkinter import *

class AT_content_calculator:

    def __init__(self, master):
        #initialising various widgets
        frame_1 = Frame(master)
        frame_1.pack()

        self.varoutput_1 = StringVar()

        self.label_1 = Label(frame_1, text="Please enter a DNA sequence:")
        self.label_1.pack()
        self.entry_1 = Entry(frame_1, textvariable=self.dna_sequence)
        self.entry_1.pack()
        self.output_1 = Label(frame_1, textvariable=self.varoutput_1)
        self.output_1.pack()
        self.button_1 = Button(frame_1, text="Calculate", command=self.validation_check)
        self.button_1.pack()

    def dna_sequence(self):
        self.dna_sequence = ()

    def validation_check(self):
        #used to validate that self.dna_sequence only contains letters G, C, A, T
        if re.match(r"GCAT", self.dna_sequence):
            self.at_calculate()
        else:
            self.varoutput_1.append = "Invalid DNA sequence. Please enter again."
            self.validation_check()

    def at_calculate(self):
        #used to calculate AT content of string stored in self.dna_sequence
        self.dna_sequence = self.entry_1.get()
        self.total_bases = len(self.dna_sequence)
        self.a_bases = self.dna_sequence.count("A")
        self.b_bases = self.dna_sequence.count("T")
        self.at_content = "%.2f" % ((self.a_bases + self.b_bases) / self.total_bases)
        self.varoutput_1.set("AT content percentage: " + self.at_content)

root = Tk()
root.title("AT content calculator")
root.geometry("320x320")
b = AT_content_calculator(root)
root.mainloop()

1 个答案:

答案 0 :(得分:2)

如果您想验证来自用户的输入仅包含字母G,C,A,T ,您需要将字符放在符合此字符的任何组合的字符类中:< / p>

注意self.dna_sequence是一个函数,您无法将其传递给match函数,尽管它不正确。您需要在该函数中返回输入值:< / p>

def dna_sequence(self):
     dna_sequence = self.entry_1.get()
     return dna_sequence

然后执行:

if re.match(r"[GCAT]+", self.dna_sequence()):

[GCAT]+将匹配长度为1或更长的字符的任意组合。如果你希望长度为4,你可以使用[GCAT]+{4}

但这也会匹配重复的字符。比如GGCC。如果您不想要这样的话,可以使用set.intersection

if len(self.dna_sequence())==4 and len(set(GCAT).intersection(self.dna_sequence()))==4:
      #do stuff

或者更好的方式:

if sorted(self.dna_sequence)==ACGT:
      #do stuff