我有20 df(名为sample1 ... sample20),每个df都使用
加载sample1 = pd.read_table('pathtosample1.csv', sep='\t', index_col=0)["score"]
我再次使用不同的变量为每个文件加载以下元步骤
meta1 = pd.read_table('pathtosample1.csv', sep='\t', index_col=0).loc[:,['junction_id','splice_site','intron_size', 'anchor','genes','transcripts', 'exons_skipped']]
sample1 df
Unique junction_id score splice_site anchor intron_size exons_skipped genes transcripts
3:107915006-107915391(-) ENSMUSG00000000001:E001 1017 GT-AG DA 386 0 Gnai3 ENSMUST00000000001
3:107912225-107912321(-) ENSMUSG00000000001:E002 10 GT-AG D 97 0 Gnai3 ENSMUST00000000001
3:107912234-107912321(-) ENSMUSG00000000001:E003 979 GT-AG DA 88 0 Gnai3 ENSMUST00000000001
3:107912530-107914853(-) ENSMUSG00000000001:E004 996 GT-AG DA 2324 0 Gnai3 ENSMUST00000000001
3:107912530-107915391(-) ENSMUSG00000000001:E005 3 GT-AG NDA 2862 1 Gnai3 ENSMUST00000000001
3:107915520-107918681(-) ENSMUSG00000000001:E006 1113 GT-AG DA 3162 0 Gnai3 ENSMUST00000000001
3:107915520-107921219(-) ENSMUSG00000000001:E007 1 GT-AG NDA 5700 1 Gnai3 ENSMUST00000000001
3:107915520-107915944(-) ENSMUSG00000000001:E008 1 GT-AG A 425 0 Gnai3 ENSMUST00000000001
3:107918809-107921219(-) ENSMUSG00000000001:E009 1141 GT-AG DA 2411 0 Gnai3 ENSMUST00000000001
表示我使用这些命令仅指示6个样本
concat = pd.concat([sample1,sample2,sample3,sample4,sample5,sample6], axis=1).fillna(0)
concat.columns = ["score_1", "score_2", "score_3","score_4", "score_5", "score_6"]
meta = pd.concat([meta1,meta2,meta3,meta4,meta5,meta6], ignore_index=True)
meta = meta[~meta.index.duplicated(keep='first')]
concat = pd.concat([concat, meta], axis=1)
concat.to_csv('data.csv')
我得到的错误是,
ValueError:无法从重复轴重新索引
我的预期输出是从第一列获取所有文件的第一列中的所有元素,并在列中添加每个样本的分数,然后添加对应于每行的其余元列,预期输出
Junction_id score1 score2 score3 score4 score5 score6 Unique splice_site intron_size anchor genes transcripts exons_skipped
ENSMUSG00000000001:E001 1017 1 1651 6 3 1 3:107915006-107915391(-) GT-AG 386 DA Gnai3 ENSMUST00000000001 0
ENSMUSG00000000001:E002 10 7 3 1144 1193 895 3:107912225-107912321(-) GT-AG 97 D Gnai3 ENSMUST00000000001 0
ENSMUSG00000000001:E003 979 1075 1588 923 1223 1017 3:107912234-107912321(-) GT-AG 88 DA Gnai3 ENSMUST00000000001 0
ENSMUSG00000000001:E004 996 3 1522 1 1 2 3:107912530-107914853(-) GT-AG 2324 DA Gnai3 ENSMUST00000000001 0
ENSMUSG00000000001:E005 3 1759 14 1127 4 1112 3:107912530-107915391(-) GT-AG 2862 NDA Gnai3 ENSMUST00000000001 1
不确定导致此错误的步骤