我从一个数据透视操作得到了这个Dataframe,我不知道如何在pandas中处理“嵌套”或多索引的数据帧。
Dataframe看起来像下面这个例子,只有比这里显示的行多得多。 [编辑:添加了一个额外的“chr18”行,以提供更具说明性的示例。这也需要过滤掉]
mmc chrom start stop experiment isdone strand countL countR
3 chr18 2044696 2044716 hj-10_b_10 FALSE - 12 12
2060000 2061000 hj-10_b_10 FALSE - 162 162
chr3 95359191 95359212 hj-10_b_10 FALSE - 2497 2497
hj-9_b_9 TRUE - 3476 3477
hj1_100_3 TRUE - 2351 2351
4 chr19 598940 598961 hj-10_b_10 FALSE - 494 494
hj1_100_3*1 TRUE - 211 211
我想从这个DataFrame中筛选出在实验级别中有多个条目的所有“chrom”条目,即选择所有的chrom和start,stop列,它们在实验索引级别中有多个条目。
结果我想要的数据帧(注意它没有mmc:3 chrom:18个条目,因为这两个条目只有一个实验“hj-10_b_10”,因此不会多次复制)。
mmc chrom start stop experiment isdone strand countL countR
3 chr3 95359191 95359212 hj-10_b_10 FALSE - 2497 2497
hj-9_b_9 TRUE - 3476 3477
hj1_100_3 TRUE - 2351 2351
4 chr19 598940 598961 hj-10_b_10 FALSE - 494 494
hj1_100_3*1 TRUE - 211 211
我可以在熊猫之外做这件事,但因为我想学习熊猫的方式。
如何从海量数据框中选择超过特定指数级别的特定计数的所有条目。
更新
您可以使用此代码创建MultiIndex DataFrame
import pandas
from pandas import DataFrame
index_tuples_mmc= [3,3,3,3,3,4,4]
index_tuples_chrom = ["chr18","chr18","chr3","chr3","chr3","chr19","chr19"]
index_tuples_start = ["2044696","2060000","95359191","95359191","95359191","598940","598940"]
index_tuples_stop = ["2044716" ,"2061000","95359212", "95359212" , "95359212" ,"598961" , "598961"]
index_tuples_experiment = ["hj-10_b_10","hj-10_b_10","hj-10_b_10","hj-9_b_9","hj1_100_3","hj-10_b_10","hj1_100_3*1"]
index_tuples_idone = ["FALSE","FALSE","FALSE","TRUE","TRUE","FALSE","TRUE"]
index_tuples_strand = ["-","-","-","-","-","-","-"]
arrays = [index_tuples_mmc,index_tuples_chrom,index_tuples_start,\
index_tuples_stop,index_tuples_experiment,index_tuples_idone,\
index_tuples_strand]
tuples = list(zip(*arrays))
index = pandas.MultiIndex.from_tuples(tuples,names=["mmc","chrom",\
"start","stop","experiment","isdone",\
"strand"])
df2 = DataFrame([12,162,2497,3476,2351,494,211],index=index,columns=["countL"])
df2["countR"]=df2["countL"]
答案 0 :(得分:0)
你可以试试这个:
idx=pd.IndexSlice
df.loc[idx[:,['chr3','chr19'],:,:,:,:,:,],:]
想要了解有关MultiIndex / Advanced Indexing的更多信息,请查看此处 http://pandas.pydata.org/pandas-docs/stable/advanced.html