改写熊猫行,但秩序井然

时间:2020-06-06 06:05:31

标签: python python-3.x pandas

比方说,我有一个三列的数据框:年龄,性别和国家。

我想但根据性别以有序的方式随机洗牌。有n个雄性和m个雌性,其中n可以小于,大于或等于m。改组应该以这样的方式进行:对于8个人,我们将获得以下结果:

男,女,男,女,男,女,女,女....(如果还有更多女:m> n) 男性,女性,男性,女性,男性,男性,男性,男性(如果还有更多男性:n> m) 男性,女性,男性,女性,男性,女性,男性,女性,男性,女性(如果男女平等:n = m)

df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
                   'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"], 
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})

2 个答案:

答案 0 :(得分:2)

首先在每个组中添加序列号:

df['Order'] = df.groupby('Gender').cumcount()

然后排序:

df.sort_values('Order')

它为您提供:

   Age  Gender Country  Order
0   10    Male      US      0
3   40  Female  Canada      0
1   20    Male      UK      1
4   50  Female      US      1
2   30    Male   China      2
6   70  Female   China      2
5   60    Male      UK      3
7   80  Female  Brazil      3

如果您想随机播放,请在开始时进行操作,例如df = df.sample(frac=1),请参阅:Shuffle DataFrame rows

答案 1 :(得分:0)

使用'Sort_Column'创建两个新的数据帧,并使df_male数据帧为偶数值,并使df_female数据帧为奇数值。然后,使用pd.concat将它们放回原处,并在.sort_values()上使用'Sort_Column'

df = pd.DataFrame({'Age': [10, 20, 30, 40, 50, 60, 70, 80],
                   'Gender': ["Male", "Male", "Male", "Female", "Female", "Male", "Female", "Female"], 
'Country': ["US", "UK", "China", "Canada", "US", "UK", "China", "Brazil"]})
df['Sort_Column'] = 0
df_male = df.loc[df['Gender'] == 'Male'].reset_index(drop=True)
df_male['Sort_Column'] = df_male['Sort_Column'] + df_male.index*2
df_female = df1.loc[df1['Gender'] == 'Female'].reset_index(drop=True)
df_female['Sort_Column'] = df_female['Sort_Column'] + df_female.index*2 + 1
df_sorted=pd.concat([df_male, df_female]).sort_values('Sort_Column').drop('Sort_Column', axis=1).reset_index(drop=True)
df_sorted

输出:

    Age Gender  Country
0   10  Male    US
1   40  Female  Canada
2   20  Male    UK
3   50  Female  US
4   30  Male    China
5   70  Female  China
6   60  Male    UK
7   80  Female  Brazil
相关问题