正在一个用户测试小组工作,该小组需要基于两个变量从小组中获得平等的代表。
我已经通过手动拆分和重新组合来完成这一艰苦的工作,希望有一种我忽略的简单方法
import pandas as pd
import numpy as np
df = pd.read_csv("Desktop/Explorer_Data.csv")
males = df[df["Gender"] == "Male"]
male_distanced = males[males["member_segmentation"] == "Distanced"][:18750]
male_expert = males[males["member_segmentation"] == "Expert"][:18750]
male_learner = males[males["member_segmentation"] == "Learner"][:18750]
male_receptive = males[males["member_segmentation"] == "Receptive"][:18750]
males_final = pd.concat([male_distanced,male_expert,male_learner,male_receptive])
females = df[df["Gender"] == "Female"]
female_distanced = females[females["member_segmentation"] == "Distanced"][:18750]
female_expert = females[females["member_segmentation"] == "Expert"][:18750]
female_learner = females[females["member_segmentation"] == "Learner"][:18750]
female_receptive = females[females["member_segmentation"] == "Receptive"][:18750]
females_final = pd.concat([female_distanced,female_expert,female_learner,female_receptive])
final_sample = pd.concat([males_final,females_final])
这使我获得了相等于18,750人的样本,按性别和细分均等,但是对于简单的事情来说似乎有很多代码。