我有一个这样的文件:
EgrG_000961100.1 IPR001611
EgrG_000961100.1 IPR032675
EgrG_000961100.1 IPR000742
EgrG_000961100.1 IPR001791
EgrG_000961100.1 IPR001611
EgrG_000989200.1 IPR000668
EgrG_000989200.1 IPR013201
EgrG_000989200.1 IPR025660
EgrG_000989200.1 IPR000668
EgrG_000989200.1 IPR025661
EgrG_000989200.1 IPR000169
EgrG_000704400.1 IPR013780
EgrG_000704400.1 IPR015341
EgrG_000704400.1 IPR011682
EgrG_000704400.1 IPR015341
EgrG_000704400.1 IPR011013
我希望每个ID写一行(ID = EgrG_ *),下一列包含ID的所有IPR,如下所示:
EgrG_000961100.1 IPR001611|IPR032675|IPR000742|IPR001791|IPR001611
EgrG_000989200.1 IPR000668|IPR025660|IPR000668|IPR025661|IPR000169
EgrG_000704400.1 IPR013780|IPR015341|IPR011682|IPR015341|IPR011013
我不知道在python中如何做到这一点。 提前谢谢。
答案 0 :(得分:1)
f = open("file","r+")
lines = f.readlines()
f.close()
dict = {} #create a dictionary where the key is your ID and the value a list with IPR
for line in lines:
ID,IPR = line.split("/t") #I assume your txt file is TAB seperated
if dict.has_key(ID):
dict[ID] = dict[ID]+[IPR]
else:
dict[ID] = [IPR]
当你有字典时,只需按照你想要的方式将其写入文件即可。 我认为这会奏效。可能有更好或更快的解决方案,但我希望它会有所帮助。