Question

互联网上的好人！第一篇文章在这里，请保持友善。

我有一个用分号分隔的等位基因列表的DF：

         Epitope                            MHC alleles  
16         GASPAVSSL      HLA-A*02:01;HLA-A*24:02;HLA-B40;HLA-B57     
285  IREFMEKECPFIKPE                HLA-A2;HLA-A28;HLA-B8;HLA-B44   
286        VRNIMSPVM                HLA-A2;HLA-A28;HLA-B8;HLA-B44   
287        TVWFVPSIK  HLA-A*01:01;HLA-A*02:01;HLA-B57;HLA-B*46:01

我想迭代其原始数据，并将每行乘以列表中元素的数量。然后，对于每个新创建的项目，将“ MHC等位基因”列的列表替换为其中的每一项。

到目前为止，我已经尝试过：

temp_DF=temp_DF[["Description","MHC alleles"]]


new_rowsDF = pd.DataFrame(columns=temp_DF.columns)
for index, row in temp_DF.iterrows():
    if ";" in row['MHC alleles']: ## find those rows with multiple alleles
        alleles = row['MHC alleles'].split(";")

          ## make a list with only valid alleles (containing *)
        single_allele= [hla for hla in alleles if "*" in hla] 
        if not single_allele: ##if list empty ignore
            continue

        for alle in single_allele:            
            row["MHC alleles"] = alle
            new_rowsDF.loc[index] = row
    else:
        row["MHC alleles"] = row['MHC alleles'] ## leave the ones that were already single alleles
        new_rowsDF.loc[index] = row



display(new_rowsDF)

感觉我走对了，但我无法保持在循环中创建的行。这将是我想要的输出：

         Epitope    MHC alleles  
16         GASPAVSSL  HLA-A*02:01
16         GASPAVSSL  HLA-A*24:02 
287        TVWFVPSIK  HLA-B*46:01
287        TVWFVPSIK  HLA-A*02:01
287        TVWFVPSIK  HLA-A*01:01

提前谢谢！

熊猫迭代行：根据项目的列列表添加行

0 个答案: