在python中使用适当的缩进来删除一行

时间:2014-06-26 06:17:06

标签: python arrays string

INPUT:

$target: ENSG00000097007|ABL1
length: 3075
miRNA : hsa-miR-203
length: 22

mfe: -30.5 kcal/mol
p-value: 0.606919

position:  2745
target 5'   C             G     C 3'
             GUGGUCCUGGACA   CAC    
             CACCAGGAUUUGU   GUG    
miRNA  3' GAU             AAA     5'

我必须去除最后两行,然后为它分配两个数组并读取每个字符并获得如下所示的输出,

剥离后的行应采用以下格式:

   CACCAGGAUUUGU   GUG    
GAU             AAA     

如果从line1读取行字符,则应以小写字母打印,如果是第二行,则应为大写字母

程序的最终输出应该是 “GAUcaccaggauuuguAAAgug”

我们尝试读取它的代码并没有像输入

中那样剥离完美对齐的线条

这里是我们使用的代码:

 import fileinput
 import sys
 from sys import argv
 script, filename = argv
 file = open(filename)
 og1 = "AGUUCCUUUGUUUUGGUGACUG"
 pattern = "              "
 pattern1 = "miRNA  3'"
 file = open(filename)
 for line in file:
    if line.startswith(pattern):
        n = file.next()
        # print n[9:],#  bound mirna
        for i in range(0, len(og1)):
            print og1[i],
        print "\n" 
        for j in range(0,len(n)):
            print n[j],'

还有问题的进一步输入

target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-125b-5p
length: 22

mfe: -23.9 kcal/mol
p-value: 0.610132

position  168
target 5' C     C               A 3'
           CGCAG   GGGGU   AGGGA    
           GUGUU   UCCCA   UCCCU    
miRNA  3' A     CAA     GAG       5'


target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-149-3p
length: 21

mfe: -36.6 kcal/mol
p-value: 0.598318

position  798
target 5' C   UGUC     AGG        G 3'
           CGC    GCCCC   CCCUCCCU    
           GUG    CGGGG   GGGAGGGA    
miRNA  3' C   U        GCA          5'


target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-185-5p
length: 22

mfe: -27.8 kcal/mol
p-value: 0.606550

position  733
target 5' C       CUCCC   CAGAUGA        C 3'
           CGGGAGC     CCU       UCUCUCCA    
           GUCCUUG     GGA       AGAGAGGU    
miRNA  3' A       AC      A                5'


target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-199a-3p
length: 22

mfe: -21.9 kcal/mol
p-value: 0.611970

position  357
target 5' C      CC   CCU      U    C 3'
           AGCCAG   GC   GGGCUG CUGU    
           UUGGUU   CG   UCUGAU GACA    
miRNA  3' A      ACA                  5'


target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-451a
length: 21

mfe: -21.2 kcal/mol
p-value: 0.612523

position  416
target 5' C      UCAACC       A     3'
           CUCAGU      UGGUGGC        
           GAGUCA      ACCAUUG        
miRNA  3' U      UU           CCAAA 5'

2 个答案:

答案 0 :(得分:0)

file = open(filename)                                          
for segment in file.read().split("\n\ntarget"):                          
    interested_lines = segment.split('\n')[-3:-1]  #Fetch last two lines 
    split1 = interested_lines[0].split()                                 
    split2 = interested_lines[1].split()[2:-1]                           
    for i in range(0,len(split1)<len(split2)):                           
        split1.append("")                                                

    req = ""                                                             
    for i in range(0,len(split2)):                                       
        req += split2[i]+split1[i].lower()                               
    for j in range(i+1,len(split1)):                                     
        req += split1[j]                                                 
    print req

<强>输出

AguguuCAAucccaGAGucccu
CgugUcggggGCAgggaggga
AguccuugACggaAagagaggu
AuugguuACAcgUCUGAUGACA
UgagucaUUaccauugCCAAA

答案 1 :(得分:0)

我按行划分,抓住最后两行,然后迭代两行拉链并使用不是" "的字符!

def combinebases(base_data):
    lines = base_data.splitlines()[-2:]
    output = list()
    lines[0] = lines[0].lower()
    for ch1, ch2 in zip(*lines):
        output.append(max(ch1, ch2))
    return ''.join(output[10:-4])

可能更安全的结果是返回:

    return re.search("(?<=miRNA  3' )[augc]+", ''.join(output), re.I).group()

但是,如果每个基本长度都相同,那么正则表达式就会过度。

结果:

>>> txt = """$target: ENSG00000097007|ABL1
length: 3075
miRNA : hsa-miR-203
length: 22

mfe: -30.5 kcal/mol
p-value: 0.606919

position:  2745
target 5'   C             G     C 3'
             GUGGUCCUGGACA   CAC    
             CACCAGGAUUUGU   GUG    
miRNA  3' GAU             AAA     5'"""

>>> combinebases(txt)
'GAUcaccaggauuuguAAAgug'