Question

TLDR：将特定编码转换的有序类别编码为数字的最简洁方法是什么？（即保留类别的有序性质的那个）。

[“弱”，“正常”，“强”] - ＆gt; [0,1,2]

<小时/> 假设我有一个有序分类变量，类似于here中的示例：

import pandas as pd
raw_data = {'patient': [1, 1, 1, 2, 2], 
        'obs': [1, 2, 3, 1, 2], 
        'treatment': [0, 1, 0, 1, 0],
        'score': ['strong', 'weak', 'normal', 'weak', 'strong']} 
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df


obs treatment   score
0   1           strong
1   1           weak
2   1           normal
3   2           weak
4   2           strong

我可以创建一个函数并将其应用于我的数据框以获得所需的对话：

def score_to_numeric(x):
    if x=='strong':
        return 3
    if x=='normal':
        return 2
    if x=='weak':
        return 1

df['score_num'] = df['score'].apply(score_to_numeric)
df

obs treatment   score   score_num
0   1           strong  3
1   1           weak    1
2   1           normal  2
3   2           weak    1
4   2           strong  3

我的问题：我有什么方法可以内联吗？（不必具体具体的“score_to_numeric”功能。

也许使用某种lambda或替换功能？或者，这篇SO文章表明，Sklearn的LabelEncoder（）非常强大，并且通过扩展可能会以某种方式处理这个问题，但我还没想出来......

Answer 1

您可以将map()与包含映射的字典结合使用：

In [5]: d = {'strong':3, 'normal':2, 'weak':1}

In [7]: df['score_num'] = df.score.map(d)

In [8]: df
Out[8]:
   patient  obs  treatment   score  score_num
0        1    1          0  strong          3
1        1    2          1    weak          1
2        1    3          0  normal          2
3        2    1          1    weak          1
4        2    2          0  strong          3

Python：将有序类别/因子编码为数字w /特定编码转换

1 个答案: