我的代码
data = pd.read_csv('input_file', header = None, delimiter="\t", names = ['chr', 'sTSS', 'eTSS', 'gene', 'clust1', 'clust2'])
dup_clust2 = data.groupby('clust2').filter(lambda x: len(x) > 1)
for element in dup_clust2.groupby('clust2'):
print(element)
输入:
chr2 166760255 166760255 Cse1l_tss10 52 5426
chr2 166760282 166760282 Cse1l_tss9 52 5426
chr2 166885599 166886548 IRF8 150.18 5431
chr2 166885925 166885925 Znfx1_tss1 52 5433
输出:
(5426, chr sTSS eTSS gene clust1 clust2
0 chr2 166760255 166760255 Cse1l_tss10 52.0 5426
1 chr2 166760282 166760282 Cse1l_tss9 52.0 5426)
必需的输出:两行分割成元组
(0 chr2 166760255 166760255 Cse1l_tss10 52.0 5426)
(1 chr2 166760282 166760282 Cse1l_tss9 52.0 5426)