Question

我有一个pandas数据框，其中两列对应于人名。列是相关的，相同的名称表示同一个人。我想分配类别代码，使其对整个“名称”空间有效。

例如我的数据框是

df = pd.DataFrame({"P1":["a","b","c","a"], "P2":["b","c","d","c"]})

>>> df
  P1 P2
0  a  b
1  b  c
2  c  d
3  a  c

我希望将其替换为相应的类别代码，例如

这些类别实际上是从连接数组[“a”，“b”，“c”，“d”]得出的，并且单独应用于各列。我怎么能做到这一点？。

Answer 1

你可以做到

Id          Topic                                              Status                                             Id          Condition                                          Parameter                                          TrainingId
----------- -------------------------------------------------- -------------------------------------------------- ----------- -------------------------------------------------- -------------------------------------------------- -----------
1           International Sales Training                       Active                                             1           Department                                         Sales                                              1
2           AMAR Sales Training                                Active                                             2           Division                                           International Sales                                2
4           General Training                                   Active                                             4           Designation                                        Manager                                            1

(3 rows affected)

Answer 2

使用：

print (df.stack().rank(method='dense').astype(int).unstack())
   P1  P2
0   1   2
1   2   3
2   3   4
3   1   3

编辑：

对于更一般的解决方案，我使用了另一个答案，因为索引中的重复问题：

df = pd.DataFrame({"P1":["a","b","c","a"],
                   "P2":["b","c","d","c"],
                   "A":[3,4,5,6]}, index=[2,2,3,3])

print (df)
   A P1 P2
2  3  a  b
2  4  b  c
3  5  c  d
3  6  a  c

cols = ['P1','P2']
df[cols] = (pd.factorize(df[cols].values.ravel())[0]+1).reshape(-1, len(cols))
print (df)
   A  P1  P2
2  3   1   2
2  4   2   3
3  5   3   4
3  6   1   3

pandas数据框类别代码来自两列

2 个答案: