chrA_x ens_geneA geneA chrB ens_geneB geneB
chr1:92092600 ENSG00000189195 BTBD8 chr2:164084669 ENSG00000237844 AC016766.1
chr1:121498879 ENSG00000233432 AL592 chr9:2781522 ENSG00000080608 PUM3
chr1:200152569 ENSG00000116833 NR5A2 chr7:112680583 ENSG00000223646 AC002463.1
chr1:205618297 ENSG00000158711 ELK4 chr7:32968816 ENSG00000122642 FKBP9
chr1:92092600 ENSG00000189195 BTBD8 chr2:164084669 ENSG00000237844 AC016766.1
chr1:92092600 ENSG00000189195 BTBD8 chr9:2781522 ENSG00000080608 PUM3
预期输出:
chrA_x ens_geneA geneA chrB ens_geneB geneB
chr1:92092600 ENSG00000189195 BTBD8 chr2:164084669 ENSG00000237844 AC016766.1
到目前为止,我的代码仅给出了geneA和geneB中元素重复的行,而不是组合重复:
import pandas as pd
import numpy as np
pd.options.display.max_colwidth = 100
pd.set_option('display.max_columns', None)
df = pd.read_excel("data.xlsx")
dups = np.logical_and((df[df.duplicated(['geneA'])]), (df[df.duplicated(['geneB'])]))
答案 0 :(得分:2)
您应该首先合并这些列并测试该组合是否重复。假设字段中不存在逗号(alter procedure x
@NumStr as varchar(50),
@date as udate,
@contactCode as int
as
select * from y
where (Reg = @NumStr )
or ( NumStr = @NumStr and date = @date and contactCode = @contactCode )
p.s: reg is an integer Field
),则可以使用:
,