Question

我有一个这样的文件：

EgrG_000961100.1    IPR001611
EgrG_000961100.1    IPR032675
EgrG_000961100.1    IPR000742
EgrG_000961100.1    IPR001791
EgrG_000961100.1    IPR001611
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR013201
EgrG_000989200.1    IPR025660
EgrG_000989200.1    IPR000668
EgrG_000989200.1    IPR025661
EgrG_000989200.1    IPR000169
EgrG_000704400.1    IPR013780
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011682
EgrG_000704400.1    IPR015341
EgrG_000704400.1    IPR011013

我希望每个ID写一行（ID = EgrG_ *），下一列包含ID的所有IPR，如下所示：

EgrG_000961100.1    IPR001611|IPR032675|IPR000742|IPR001791|IPR001611
EgrG_000989200.1    IPR000668|IPR025660|IPR000668|IPR025661|IPR000169
EgrG_000704400.1    IPR013780|IPR015341|IPR011682|IPR015341|IPR011013

我不知道在python中如何做到这一点。提前谢谢。

Answer 1

f =  open("file","r+")
lines = f.readlines() 
f.close()
dict = {} #create a dictionary where the key is your ID and the value a list with IPR
for line in lines:
     ID,IPR = line.split("/t") #I assume your txt file is TAB seperated
     if dict.has_key(ID):
          dict[ID] = dict[ID]+[IPR]
     else:
          dict[ID] = [IPR]

当你有字典时，只需按照你想要的方式将其写入文件即可。我认为这会奏效。可能有更好或更快的解决方案，但我希望它会有所帮助。

Python - 从列中聚类数据

1 个答案: