互联网上的好人!第一篇文章在这里,请保持友善。
我有一个用分号分隔的等位基因列表的DF:
Epitope MHC alleles
16 GASPAVSSL HLA-A*02:01;HLA-A*24:02;HLA-B40;HLA-B57
285 IREFMEKECPFIKPE HLA-A2;HLA-A28;HLA-B8;HLA-B44
286 VRNIMSPVM HLA-A2;HLA-A28;HLA-B8;HLA-B44
287 TVWFVPSIK HLA-A*01:01;HLA-A*02:01;HLA-B57;HLA-B*46:01
我想迭代其原始数据,并将每行乘以列表中元素的数量。然后,对于每个新创建的项目,将“ MHC等位基因”列的列表替换为其中的每一项。
到目前为止,我已经尝试过:
temp_DF=temp_DF[["Description","MHC alleles"]]
new_rowsDF = pd.DataFrame(columns=temp_DF.columns)
for index, row in temp_DF.iterrows():
if ";" in row['MHC alleles']: ## find those rows with multiple alleles
alleles = row['MHC alleles'].split(";")
## make a list with only valid alleles (containing *)
single_allele= [hla for hla in alleles if "*" in hla]
if not single_allele: ##if list empty ignore
continue
for alle in single_allele:
row["MHC alleles"] = alle
new_rowsDF.loc[index] = row
else:
row["MHC alleles"] = row['MHC alleles'] ## leave the ones that were already single alleles
new_rowsDF.loc[index] = row
display(new_rowsDF)
感觉我走对了,但我无法保持在循环中创建的行。这将是我想要的输出:
Epitope MHC alleles
16 GASPAVSSL HLA-A*02:01
16 GASPAVSSL HLA-A*24:02
287 TVWFVPSIK HLA-B*46:01
287 TVWFVPSIK HLA-A*02:01
287 TVWFVPSIK HLA-A*01:01
提前谢谢!