如何在多个列上进行熊猫采样?

时间:2019-12-05 19:49:00

标签: python pandas dataframe

我有一个约有800万观察值的数据框。我需要从中提取一个样本,但想从多个列中提取样本。

我尝试了以下无效的方法:

import pandas as pd

state = ['mi', 'mi', 'mi', 'nc', 'pa', 'pa', 'ga']
state = state * 50
age = ['21', '22', '23', '23', '23', '50', '50']
age = age * 50
random = ['.445', '.324', '.234', '.143', '.568', '.777', '.256']
random = random * 50
data = {'state':state, 'age': age, 'random': random}
df = pd.DataFrame.from_dict(data = data)

df_sample = df.sample(n = 25, weights = ['state', 'age'], random_state = 48)

我意识到pandas文档没有说明我想做的事情是可能的。有办法吗?

1 个答案:

答案 0 :(得分:2)

IIUC,

我认为您正在寻求实现以下目标:

df_sample = df[['state','age']].sample(n = 25, random_state = 48)