我有样本数据集A.看起来像:
1:CH,AG,ME,GS;AP,CH;HE,AC;AC,AG
2:CA;HE,AT;AT,AC;AT,OG
3:NE,AG,AC;CS,OD
预期结果应为:
['CH','AG','ME','GS','AP','CH','HE','AC','AC','AG','CA','HE','AT','AT','AC','AT','OG','NE','AG','AC','CS','OD']
我不确定如何将Python中的代码写入列表。
答案 0 :(得分:4)
一种选择是使用正则表达式找到所有2个连续的大写字母大小写:
In [1]: import re
In [2]: data = """
...: 1:CH,AG,ME,GS;AP,CH;HE,AC;AC,AG
...: 2:CA;HE,AT;AT,AC;AT,OG
...: 3:NE,AG,AC;CS,OD"""
In [3]: re.findall(r"[A-Z]{2}", data, re.MULTILINE)
Out[3]:
['CH',
'AG',
'ME',
'GS',
'AP',
'CH',
'HE',
'AC',
'AC',
'AG',
'CA',
'HE',
'AT',
'AT',
'AC',
'AT',
'OG',
'NE',
'AG',
'AC',
'CS',
'OD']
答案 1 :(得分:0)
如果使用Python 2.7
,请尝试此操作a = "CH,AG,ME,GS;AP,CH;HE,AC;AC,AG"
b = "CA;HE,AT;AT,AC;AT,OG"
c = "NE,AG,AC;CS,OD"
d = a+','+b+','+c
d = d.replace(';',',')
print d.split(',') #output as expected