根据值将DataFrame拆分为相等大小的DataFrame

时间:2019-05-27 02:22:24

标签: python pandas

正在一个用户测试小组工作,该小组需要基于两个变量从小组中获得平等的代表。

我已经通过手动拆分和重新组合来完成这一艰苦的工作,希望有一种我忽略的简单方法

import pandas as pd
import numpy as np

df = pd.read_csv("Desktop/Explorer_Data.csv")

males = df[df["Gender"] == "Male"]

male_distanced = males[males["member_segmentation"] == "Distanced"][:18750]
male_expert = males[males["member_segmentation"] == "Expert"][:18750]
male_learner = males[males["member_segmentation"] == "Learner"][:18750]
male_receptive = males[males["member_segmentation"] == "Receptive"][:18750]

males_final = pd.concat([male_distanced,male_expert,male_learner,male_receptive])

females = df[df["Gender"] == "Female"]

female_distanced = females[females["member_segmentation"] == "Distanced"][:18750]
female_expert = females[females["member_segmentation"] == "Expert"][:18750]
female_learner = females[females["member_segmentation"] == "Learner"][:18750]
female_receptive = females[females["member_segmentation"] == "Receptive"][:18750]

females_final = pd.concat([female_distanced,female_expert,female_learner,female_receptive])

final_sample = pd.concat([males_final,females_final])

这使我获得了相等于18,750人的样本,按性别和细分均等,但是对于简单的事情来说似乎有很多代码。

0 个答案:

没有答案