INPUT:
$target: ENSG00000097007|ABL1
length: 3075
miRNA : hsa-miR-203
length: 22
mfe: -30.5 kcal/mol
p-value: 0.606919
position: 2745
target 5' C G C 3'
GUGGUCCUGGACA CAC
CACCAGGAUUUGU GUG
miRNA 3' GAU AAA 5'
我必须去除最后两行,然后为它分配两个数组并读取每个字符并获得如下所示的输出,
剥离后的行应采用以下格式:
CACCAGGAUUUGU GUG
GAU AAA
如果从line1读取行字符,则应以小写字母打印,如果是第二行,则应为大写字母
程序的最终输出应该是 “GAUcaccaggauuuguAAAgug”
我们尝试读取它的代码并没有像输入
中那样剥离完美对齐的线条这里是我们使用的代码:
import fileinput
import sys
from sys import argv
script, filename = argv
file = open(filename)
og1 = "AGUUCCUUUGUUUUGGUGACUG"
pattern = " "
pattern1 = "miRNA 3'"
file = open(filename)
for line in file:
if line.startswith(pattern):
n = file.next()
# print n[9:],# bound mirna
for i in range(0, len(og1)):
print og1[i],
print "\n"
for j in range(0,len(n)):
print n[j],'
还有问题的进一步输入
target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-125b-5p
length: 22
mfe: -23.9 kcal/mol
p-value: 0.610132
position 168
target 5' C C A 3'
CGCAG GGGGU AGGGA
GUGUU UCCCA UCCCU
miRNA 3' A CAA GAG 5'
target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-149-3p
length: 21
mfe: -36.6 kcal/mol
p-value: 0.598318
position 798
target 5' C UGUC AGG G 3'
CGC GCCCC CCCUCCCU
GUG CGGGG GGGAGGGA
miRNA 3' C U GCA 5'
target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-185-5p
length: 22
mfe: -27.8 kcal/mol
p-value: 0.606550
position 733
target 5' C CUCCC CAGAUGA C 3'
CGGGAGC CCU UCUCUCCA
GUCCUUG GGA AGAGAGGU
miRNA 3' A AC A 5'
target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-199a-3p
length: 22
mfe: -21.9 kcal/mol
p-value: 0.611970
position 357
target 5' C CC CCU U C 3'
AGCCAG GC GGGCUG CUGU
UUGGUU CG UCUGAU GACA
miRNA 3' A ACA 5'
target: ENSG00000142208|ENST00000349310|AKT1
length: 992
miRNA : hsa-miR-451a
length: 21
mfe: -21.2 kcal/mol
p-value: 0.612523
position 416
target 5' C UCAACC A 3'
CUCAGU UGGUGGC
GAGUCA ACCAUUG
miRNA 3' U UU CCAAA 5'
答案 0 :(得分:0)
file = open(filename)
for segment in file.read().split("\n\ntarget"):
interested_lines = segment.split('\n')[-3:-1] #Fetch last two lines
split1 = interested_lines[0].split()
split2 = interested_lines[1].split()[2:-1]
for i in range(0,len(split1)<len(split2)):
split1.append("")
req = ""
for i in range(0,len(split2)):
req += split2[i]+split1[i].lower()
for j in range(i+1,len(split1)):
req += split1[j]
print req
<强>输出强>
AguguuCAAucccaGAGucccu
CgugUcggggGCAgggaggga
AguccuugACggaAagagaggu
AuugguuACAcgUCUGAUGACA
UgagucaUUaccauugCCAAA
答案 1 :(得分:0)
我按行划分,抓住最后两行,然后迭代两行拉链并使用不是" "
的字符!
def combinebases(base_data):
lines = base_data.splitlines()[-2:]
output = list()
lines[0] = lines[0].lower()
for ch1, ch2 in zip(*lines):
output.append(max(ch1, ch2))
return ''.join(output[10:-4])
可能更安全的结果是返回:
return re.search("(?<=miRNA 3' )[augc]+", ''.join(output), re.I).group()
但是,如果每个基本长度都相同,那么正则表达式就会过度。
结果:
>>> txt = """$target: ENSG00000097007|ABL1
length: 3075
miRNA : hsa-miR-203
length: 22
mfe: -30.5 kcal/mol
p-value: 0.606919
position: 2745
target 5' C G C 3'
GUGGUCCUGGACA CAC
CACCAGGAUUUGU GUG
miRNA 3' GAU AAA 5'"""
>>> combinebases(txt)
'GAUcaccaggauuuguAAAgug'