Question

我知道这可能是一个非常简单的问题，但是我是python的新手，我不确定如何很好地处理熊猫数据帧。

以数据为例：

   Job                Skill                   RelationType
 Director            Manage staff                essential
 Director            Manage  staff               optional

目标

以数据为例：

   Job                Skill                   RelationType
Director            Manage staff                essential
Director            Manage  staff               essential

理想情况下，我想编写一个函数，当RelationType不同但Skill相同时，在这种情况下，它被覆盖并替换为必需的函数。因此，对于同一工作，必不可少的技能总是比可选技能更重要。

已解决

df['RelationType'] = df.groupby(['Jobs', 'Skill'])['RelationType'].transform('min')

Answer 1

Categorical Data对于此任务很有用。首先将RelationType转换为分类序列，然后按优先级排序。

然后使用GroupBy函数通过选择最优先的类别，通过关键字段执行min操作。

df['RelationType'] = pd.Categorical(df['RelationType'], ordered=True,
                                    categories=['essential', 'optional'])

df['RelationType'] = df.groupby(['Job', 'Skill']).transform('min')

print(df)

        Job        Skill RelationType
0  Director  ManageStaff    essential
1  Director  ManageStaff    essential

熊猫中的数据操作-Python

已解决

1 个答案: