在Pandas列

时间:2018-01-23 22:17:20

标签: python pandas

我知道我已经阅读了一种方法来做我正在寻找的东西,但我似乎无法找到它。

我有一个像这样的pandas DataFrame:

       Chrom        Loc WT Var Change ConvChange  AO     DP          VAF  \
0       chr1  115227855  T   A    T>A        T>A   5  19346  0.000258451   

      IntEx   Gene Upstream Downstream Individual  
0      TIII  TIIIa        T          C          1

我希望了解Individual VAF中每个唯一ChromLocChange组合中存在 1 2 3 Mean Std chr1-115227855-T>A 0.000258451 0.000548128 0.000789456 0.000532011 0.0002170812 的最大差异。

我正在考虑更改当前的DataFrame,因此它看起来像下面的那样给我标准偏差列,可以对其进行排序以给出具有最大差异的位置。这是一个很好的方法,还有一个很好的方法来做这样的事情吗?

SQL> select * from repeating_group;

PERSON_N F FAMILY_1 F FAMILY_2 F FAMILY_3
-------- - -------- - -------- - --------
ANDREW   A Sister   B Father   C Brother
PAUL     Z Cousin   W Mother   Y InLaw


select
    person_name, family, relation
from
    (
        select person_name, family_1 as family, family_1_rel as relation from repeating_group
        union all
        select person_name, family_2 as family, family_2_rel as relation from repeating_group
        union all
        select person_name, family_3 as family, family_3_rel as relation from repeating_group
    ) x
order by 
    person_name;

PERSON_NAME   FAMILY RELATION
------------- ------ --------
ANDREW        A      Sister
ANDREW        B      Father
ANDREW        C      Brother
PAUL          W      Mother
PAUL          Y      InLaw
PAUL          Z      Cousin

6 rows selected.

1 个答案:

答案 0 :(得分:1)

您可以使用一些Pandas重塑:

MCVE:

假设:

print(df)

  Chrom  Individual       VAF Var WT
0  chr1           1  0.076397   A  T
1  chr1           2  0.964344   A  T
2  chr1           3  0.563713   A  T

重塑和聚合:

df.set_index(['Chrom','WT','Var','Individual'])['VAF'].unstack(-1)\
  .pipe(lambda x: x.assign(mean=x.mean(1), std=x.std(1)))

输出:

Individual           1         2         3      mean       std
Chrom WT Var                                                  
chr1  T  A    0.076397  0.964344  0.563713  0.534818  0.444678