我有4个数据框:
symptoms = pd.DataFrame(columns=['subject', 'name', 'type'])
AA_alleles = pd.DataFrame(columns=['subject', 'chrom', 'pos', 'bp', 'SNPid'])
Aa_alleles = pd.DataFrame(columns=['subject', 'chrom', 'pos', 'bp', 'SNPid'])
aa_alleles = pd.DataFrame(columns=['subject', 'chrom', 'pos', 'bp', 'SNPid'])
所有数据框中的“主题”包含唯一的主题ID。我想找到所有症状和等位基因的组合,每种组合的受试者数,以及一栏纯合度(每个df名称为AA,AA或Aa)。例如,我正在寻找的结果是:
['symptom_name', 'symptom_type', 'zygosity', 'chrom', 'pos', 'bp', 'SNPid', 'subject_count']
进行此转换的最佳方法是什么?作为参考,这是为了为Freeman-Halton测试准备数据。
答案 0 :(得分:0)
我想我已经解决了:
AA_alleles['zygosity'] = 'AA'
Aa_alleles['zygosity'] = 'Aa'
aa_alleles['zygosity'] = 'aa'
df = symptoms.merge(AA_alleles, on='subject')
df2 = symptoms.merge(Aa_alleles, on='subject')
df3 = symptoms.merge(aa_alleles, on='subject')
total = df.append(df2, ignore_index=True)
total = total.append(df3, ignore_index=True)
count = total.groupby(['chrom','pos','bp','SNPid','zygosity','name','type']).count()
似乎可以解决问题:
Chromosome_X 2724760 T rs55842969;rs111382948 Aa Albumin high 1
BMI high 1
FEV1 high 1
low 1
FVC high 1
... ... ... ... ... ... ... ...
Chromosome_9 141016320 A rs41290003 Aa Trunkal Fat Mass high 1
Urea high 1
VLDL high 1
WHR high 2
Weight high 1