我在Python工作,我有以下数据:
['DDX58_HUMAN', 'gnl|CDD|256537', '819', '923']
['DDX58_HUMAN', 'gnl|CDD|260076', '111', '189']
['DDX58_HUMAN', 'gnl|CDD|260076', '4', '93']
['DDX58_HUMAN', 'gnl|CDD|238005', '258', '410']
['DDX58_HUMAN', 'gnl|CDD|238034', '606', '741']
['DICER_HUMAN', 'gnl|CDD|239209', '886', '1008']
['DICER_HUMAN', 'gnl|CDD|238333', '1681', '1846']
['DICER_HUMAN', 'gnl|CDD|238333', '1296', '1376']
['DICER_HUMAN', 'gnl|CDD|238333', '1547', '1583']
['DICER_HUMAN', 'gnl|CDD|251903', '630', '722']
['DICER_HUMAN', 'gnl|CDD|238005', '58', '209']
['DICER_HUMAN', 'gnl|CDD|238034', '444', '553']
在匹配第一项后,我需要打印第2,第3和第4项:
DDX58_HUMAN gnl|CDD|256537 819 923 gnl|CDD|260076 111 189 gnl|CDD|260076 4
93 gnl|CDD|238005 258 410 gnl|CDD|238034 606 741
DICER_HUMAN gnl|CDD|239209 886 1008 gnl|CDD|238333 1681 1846 gnl|CDD|238333
1296 1376 gnl|CDD|238333 1547 1583 gnl|CDD|251903 630 722 gnl|CDD|238005 58
209 gnl|CDD|238034 444 553
我怎样才能做到这一点?
答案 0 :(得分:0)
以下是您要执行的操作的示例代码: 我假设你在python列表中有这些数据 您可以遍历每个列表,并根据列表的第一个元素将值存储在字典中,您将能够获得唯一的条目。
mylist = [['DDX58_HUMAN', 'gnl|CDD|256537', '819', '923']
,['DDX58_HUMAN', 'gnl|CDD|260076', '111', '189']
,['DDX58_HUMAN', 'gnl|CDD|260076', '4', '93']
,['DDX58_HUMAN', 'gnl|CDD|238005', '258', '410']
,['DDX58_HUMAN', 'gnl|CDD|238034', '606', '741']
,['DICER_HUMAN', 'gnl|CDD|239209', '886', '1008']
,['DICER_HUMAN', 'gnl|CDD|238333', '1681', '1846']
,['DICER_HUMAN', 'gnl|CDD|238333', '1296', '1376']
,['DICER_HUMAN', 'gnl|CDD|238333', '1547', '1583']
,['DICER_HUMAN', 'gnl|CDD|251903', '630', '722']
,['DICER_HUMAN', 'gnl|CDD|238005', '58', '209']
,['DICER_HUMAN', 'gnl|CDD|238034', '444', '553']]
myDict = {}
for items in mylist :
myDict.setdefault(items[0],[]).append(" ".join(x for x in items[1:]))
for k,v in myDict.items():
print(k," : "," ".join(x for x in v))
输出
DDX58_HUMAN : gnl|CDD|256537 819 923 gnl|CDD|260076 111 189 gnl|CDD|260076 4 93 gnl|CDD|238005 258 410 gnl|CDD|238034 606 741
DICER_HUMAN : gnl|CDD|239209 886 1008 gnl|CDD|238333 1681 1846 gnl|CDD|238333 1296 1376 gnl|CDD|238333 1547 1583 gnl|CDD|251903 630 722 gnl|CDD|238005 58 209 gnl|CDD|238034 444 553
如果您的数据位于.txt文件中
只需阅读文本文件并使用re
模块删除不需要的大括号,然后上述相同的逻辑就可以工作。
import re
with open("data.txt") as mylist :
myDict = {}
mainList = []
for items in mylist.readlines() :
dataString = re.sub(r"[\[[\]]","",items.rstrip()).split(",")
mainList.append(dataString)
myDict = {}
for items in mainList :
myDict.setdefault(items[0],[]).append("".join(x for x in items[1:]))
for k,v in myDict.items():
print(k," : ","".join(x for x in v))
输出
'DICER_HUMAN' : 'gnl|CDD|239209' '886' '1008' 'gnl|CDD|238333' '1681' '1846' 'gnl|CDD|238333' '1296' '1376' 'gnl|CDD|238333' '1547' '1583' 'gnl|CDD|251903' '630' '722' 'gnl|CDD|238005' '58' '209' 'gnl|CDD|238034' '444' '553'
'DDX58_HUMAN' : 'gnl|CDD|256537' '819' '923' 'gnl|CDD|260076' '111' '189' 'gnl|CDD|260076' '4' '93' 'gnl|CDD|238005' '258' '410' 'gnl|CDD|238034' '606' '741'