Question

我有2个数据框，例如：

TAXID

acc_number     taxi 
YP_001378452 2345
YP_001650052 5678
YP_009446812 5435
YP_002192894 7890

和

爆炸

Nothing  cluster         species     target          score
7101    cluster_000001  species1    YP_001378452.1  31.7    
50457   cluster_000001  species2    YP_001650052.1  27.9    
48798   cluster_000001  species3    YP_002192894.1  34.5    
8514    cluster_000001  species4    YP_009446812.1  28.9

并且想法是在df2 BUT中添加滑行列，因为您可以看到目标并不完全相似，因为在df2中末尾添加了.1。

我试图更好地向您解释：

TAXID=pd.read_table("/pathtoTAXID.txt",header=0)
blast=pd.read_table("/pathtoblast.txt",header=0)


for i in blast["target"]:
    if i in TAXID["acc_number"] without .1:
        add TAXID[taxi] in the line of the blast

我也尝试过：

for i in blast["target"]:
    print(TAXID.loc[TAXID["Acc_number"] == i.split('.')[0]][1])

但是我被困在这里只是保留出租车号码感谢您的帮助。

Answer 1

在dict(zip())的帮助下，将ChronoUnit与s.str.split()结合使用来制作字典：

blast['taxi']=blast.target.str.split(".").str[0].map(dict(zip(TAXID.acc_number,TAXID.taxi)))
print(df2)

   Nothing         cluster   species          target  score  taxi
0     7101  cluster_000001  species1  YP_001378452.1   31.7  2345
1    50457  cluster_000001  species2  YP_001650052.1   27.9  5678
2    48798  cluster_000001  species3  YP_002192894.1   34.5  7890
3     8514  cluster_000001  species4  YP_009446812.1   28.9  5435

Answer 2

replace的魔术：-)仅在target中的所有blast都在TAXID中具有映射

blast['New']=blast.target.replace(dict(zip(TAXID['acc_number'],TAXID['taxi'])),regex=True)
blast
Out[533]: 
   Nothing         cluster   species          target  score   New
0     7101  cluster_000001  species1  YP_001378452.1   31.7  2345
1    50457  cluster_000001  species2  YP_001650052.1   27.9  5678
2    48798  cluster_000001  species3  YP_002192894.1   34.5  7890
3     8514  cluster_000001  species4  YP_009446812.1   28.9  5435

使用python添加取决于2 df的列

2 个答案: