我有一个约有800万观察值的数据框。我需要从中提取一个样本,但想从多个列中提取样本。
我尝试了以下无效的方法:
import pandas as pd
state = ['mi', 'mi', 'mi', 'nc', 'pa', 'pa', 'ga']
state = state * 50
age = ['21', '22', '23', '23', '23', '50', '50']
age = age * 50
random = ['.445', '.324', '.234', '.143', '.568', '.777', '.256']
random = random * 50
data = {'state':state, 'age': age, 'random': random}
df = pd.DataFrame.from_dict(data = data)
df_sample = df.sample(n = 25, weights = ['state', 'age'], random_state = 48)
我意识到pandas
文档没有说明我想做的事情是可能的。有办法吗?
答案 0 :(得分:2)
IIUC,
我认为您正在寻求实现以下目标:
df_sample = df[['state','age']].sample(n = 25, random_state = 48)