Python Pandas从Groupby中选择组的随机样本

时间:2015-09-01 20:46:26

标签: python pandas random group-by

获取groupby元素的随机样本的最佳方法是什么?据我了解,groupby只是一个可迭代的群组。

如果我想选择N = 200元素,我将为迭代执行此操作的标准方法是:

rand = random.sample(data, N)  

如果你尝试上面的数据是一个'分组'由于某种原因,结果列表的元素是元组。

我找到了以下示例,用于随机选择单个键groupby的元素,但这不适用于多键groupby。来自,How to access pandas groupby dataframe by key

  

创建groupby对象

grouped = df.groupby('some_key')
     

选择N个数据帧并获取其索引

sampled_df_i = random.sample(grouped.indices, N)
     

使用groupby对象抓取群组' get_group'方法

df_list = map(lambda df_i: grouped.get_group(df_i),sampled_df_i)
     

可选 - 将其全部转换回单个数据框对象

sampled_df = pd.concat(df_list, axis=0, join='outer')

2 个答案:

答案 0 :(得分:8)

You can take a randoms sample of the unique values of df.some_key.unique(), use that to slice the df and finally groupby on the resultant:

In [337]:

df = pd.DataFrame({'some_key': [0,1,2,3,0,1,2,3,0,1,2,3],
                   'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
In [338]:

print df[df.some_key.isin(random.sample(df.some_key.unique(),2))].groupby('some_key').mean()
               val
some_key          
0         1.000000
2         3.666667

If there are more than one groupby keys:

In [358]:

df = pd.DataFrame({'some_key1':[0,1,2,3,0,1,2,3,0,1,2,3],
                   'some_key2':[0,0,0,0,1,1,1,1,2,2,2,2],
                   'val':      [1,2,3,4,1,5,1,5,1,6,7,8]})
In [359]:

gby = df.groupby(['some_key1', 'some_key2'])
In [360]:

print gby.mean().ix[random.sample(gby.indices.keys(),2)]
                     val
some_key1 some_key2     
1         1            5
3         2            8

But if you are just going to get the values of each group, you don't even need to groubpy, MultiIndex will do:

In [372]:

idx = random.sample(set(pd.MultiIndex.from_product((df.some_key1, df.some_key2)).tolist()),
                    2)
print df.set_index(['some_key1', 'some_key2']).ix[idx]
                     val
some_key1 some_key2     
2         0            3
3         1            5

答案 1 :(得分:0)

我觉得较低级别的 @Entity() @Unique(['a', 'b']) export class AB { @PrimaryGeneratedColumn('uuid') uuid: string; @ManyToOne(() => A, { nullable: false }) a: A; @ManyToOne(() => B, { nullable: false }) b: B; } 操作更简洁:

numpy