我是Python的新手,想看看它在python中的工作方式。
我在下面有一个称为data.txt
的数据,我想从此数据中检索四个列。首先,我要检索退化组类别,然后是p值,然后是Query:
之前和之后的文本。因此结果应如下所示:
Degardome Category: 4 Degradome p-value: 0.00120246641531374 3' AUUAAUAACCGGCCUGUUUGC 5' Seq_1950_218
Degardome Category: 4 Degradome p-value: 0.00360306320817827 3' ACUUUCUUUUCUUAA--UCUUUC 5' Seq_2171_593
data.txt:
Degardome Category: 4
Degradome p-value: 0.00120246641531374
T-Plot file: T-plots-IGR/Seq_5744_249_Supercontig_2.10_1257006_264_TPlot.pdf
Position Reads Category
264 1 4 <<<<<<<<<<
914 1 4
987 4 0
---------------------------------------------------
---------------------------------------------------
5' UUGGAGGUGGCUGGACGGAUG 3' Transcript: Supercontig_2.10_1395094:908-928 Slice Site:919
||||o||||oo|o|
3' AUUAAUAACCGGCCUGUUUGC 5' Query: Seq_1950_218
HV2.fasta_dd.txt
Degardome Category: 4
Degradome p-value: 0.00360306320817827
T-Plot file: T-plots-IGR/Seq_1950_218_Supercontig_2.10_1395094_919_TPlot.pdf
Position Reads Category
919 1 4 <<<<<<<<<<
---------------------------------------------------
---------------------------------------------------
5' AGAAGGGGAAGAGUGGAGGAGAG 3' Transcript: Supercontig_2.10_1543625:626-648 Slice Site:637
|||o|oo||||o| o||o||
3' ACUUUCUUUUCUUAA--UCUUUC 5' Query: Seq_2171_593
答案 0 :(得分:1)
如果您使用
读取了整个文件 with open('file.txt', 'r') as f:
a = f.read()
a = a.split('\n')
将给出以下输出:
['Degardome Category: 4',
'Degradome p-value: 0.00120246641531374',
'T-Plot file: T-plots IGR/Seq_5744_249_Supercontig_2.10_1257006_264_TPlot.pdf',
'',
'Position Reads Category',
'264 1 4 <<<<<<<<<<',
'914 1 4',
'987 4 0',
'---------------------------------------------------',
'---------------------------------------------------',
'',
"5' UUGGAGGUGGCUGGACGGAUG 3' Transcript: Supercontig_2.10_1395094:908-928 Slice Site:919",
' ||||o||||oo|o|',
"3' AUUAAUAACCGGCCUGUUUGC 5' Query: Seq_1950_218",
'HV2.fasta_dd.txt',
'Degardome Category: 4',
'Degradome p-value: 0.00360306320817827',
'T-Plot file: T-plots-IGR/Seq_1950_218_Supercontig_2.10_1395094_919_TPlot.pdf',
'',
'Position Reads Category',
'919 1 4 <<<<<<<<<<',
'---------------------------------------------------',
'---------------------------------------------------',
'',
"5' AGAAGGGGAAGAGUGGAGGAGAG 3' Transcript: Supercontig_2.10_1543625:626-648 Slice Site:637",
' |||o|oo||||o| o||o||',
"3' ACUUUCUUUUCUUAA--UCUUUC 5' Query: Seq_2171_593"]
现在初始化一个空字符串并连接所有相关部分:
In [4]: t = ''
In [5]: for line in a:
...: if 'Degardome Category:' in line:
...: t += line + ' '
...: if 'Degradome p-value:' in line:
...: t += line + ' '
...: if 'Query' in line:
...: t += line.replace('Query:', '') + '\n'
最后,根据新行分割字符串:
In [6]: out = [i for i in t.split('\n') if i]
In [7]: out
Out[7]:
["Degardome Category: 4 Degradome p-value: 0.00120246641531374 3'
AUUAAUAACCGGCCUGUUUGC 5' Seq_1950_218",
"Degardome Category: 4 Degradome p-value: 0.00360306320817827 3'
ACUUUCUUUUCUUAA--UCUUUC 5' Seq_2171_593"]
答案 1 :(得分:1)
使用模块re
的解决方案:
pattern1 = re.compile(r'Degardome Category')
pattern2 = re.compile(r'Degradome p-value')
pattern3 = re.compile(r'Query')
l1 = []
l2 = []
l3 = []
with open('/home/mayankp/data.txt') as f:
for i in f:
if pattern1.search(i):
a = re.sub('\n','',i)
l1.append(a)
elif pattern2.search(i):
a = re.sub('\n','',i)
l2.append(a)
elif pattern3.search(i):
a = re.sub('Query:','',i)
b = re.sub('\n','',a)
l3.append(b)
In [1244]: output = zip(l1,l2,l3)
In [1245]: output
Out[1245]:
[('Degardome Category: 4',
'Degradome p-value: 0.00120246641531374',
"3' AUUAAUAACCGGCCUGUUUGC 5' Seq_1950_218"),
('Degardome Category: 4',
'Degradome p-value: 0.00360306320817827',
"3' ACUUUCUUUUCUUAA--UCUUUC 5' Seq_2171_593")]
现在,您可以将此output
写到文件中。