我知道我已经阅读了一种方法来做我正在寻找的东西,但我似乎无法找到它。
我有一个像这样的pandas DataFrame:
Chrom Loc WT Var Change ConvChange AO DP VAF \
0 chr1 115227855 T A T>A T>A 5 19346 0.000258451
IntEx Gene Upstream Downstream Individual
0 TIII TIIIa T C 1
我希望了解Individual
VAF
中每个唯一Chrom
,Loc
和Change
组合中存在 1 2 3 Mean Std
chr1-115227855-T>A 0.000258451 0.000548128 0.000789456 0.000532011 0.0002170812
的最大差异。
我正在考虑更改当前的DataFrame,因此它看起来像下面的那样给我标准偏差列,可以对其进行排序以给出具有最大差异的位置。这是一个很好的方法,还有一个很好的方法来做这样的事情吗?
SQL> select * from repeating_group;
PERSON_N F FAMILY_1 F FAMILY_2 F FAMILY_3
-------- - -------- - -------- - --------
ANDREW A Sister B Father C Brother
PAUL Z Cousin W Mother Y InLaw
select
person_name, family, relation
from
(
select person_name, family_1 as family, family_1_rel as relation from repeating_group
union all
select person_name, family_2 as family, family_2_rel as relation from repeating_group
union all
select person_name, family_3 as family, family_3_rel as relation from repeating_group
) x
order by
person_name;
PERSON_NAME FAMILY RELATION
------------- ------ --------
ANDREW A Sister
ANDREW B Father
ANDREW C Brother
PAUL W Mother
PAUL Y InLaw
PAUL Z Cousin
6 rows selected.
答案 0 :(得分:1)
您可以使用一些Pandas重塑:
MCVE:
假设:
print(df)
Chrom Individual VAF Var WT
0 chr1 1 0.076397 A T
1 chr1 2 0.964344 A T
2 chr1 3 0.563713 A T
重塑和聚合:
df.set_index(['Chrom','WT','Var','Individual'])['VAF'].unstack(-1)\
.pipe(lambda x: x.assign(mean=x.mean(1), std=x.std(1)))
输出:
Individual 1 2 3 mean std
Chrom WT Var
chr1 T A 0.076397 0.964344 0.563713 0.534818 0.444678