Python - 从列中聚类数据

时间:2017-03-13 12:59:14

标签: python python-2.7

我有一个这样的文件:

EgrG_000961100.1    IPR001611
EgrG_000961100.1    IPR032675
EgrG_000961100.1    IPR000742
EgrG_000961100.1    IPR001791
EgrG_000961100.1    IPR001611
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR013201
EgrG_000989200.1    IPR025660
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR025661
EgrG_000989200.1    IPR000169
EgrG_000704400.1    IPR013780
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011682
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011013

我希望每个ID写一行(ID = EgrG_ *),下一列包含ID的所有IPR,如下所示:

EgrG_000961100.1    IPR001611|IPR032675|IPR000742|IPR001791|IPR001611
EgrG_000989200.1    IPR000668|IPR025660|IPR000668|IPR025661|IPR000169
EgrG_000704400.1    IPR013780|IPR015341|IPR011682|IPR015341|IPR011013

我不知道在python中如何做到这一点。 提前谢谢。

1 个答案:

答案 0 :(得分:1)

f =  open("file","r+")
lines = f.readlines() 
f.close()
dict = {} #create a dictionary where the key is your ID and the value a list with IPR
for line in lines:
     ID,IPR = line.split("/t") #I assume your txt file is TAB seperated
     if dict.has_key(ID):
          dict[ID] = dict[ID]+[IPR]
     else:
          dict[ID] = [IPR]

当你有字典时,只需按照你想要的方式将其写入文件即可。 我认为这会奏效。可能有更好或更快的解决方案,但我希望它会有所帮助。