Question

我必须为分类数据分配标签。让我们考虑一下虹膜的例子：

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

iris = load_iris()

print "targets: ", np.unique(iris.target)
print "targets: ", iris.target.shape
print "target_names: ", np.unique(iris.target_names)
print "target_names: ", iris.target_names.shape

将打印出来：

目标：[0 1 2]目标：（150L，）target_names：[＆＃39; setosa＆＃39; ＆＃39;云芝＆＃39; ＆＃39; virginica＆＃39;] target_names：（3L，）

为了产生所需的标签我使用pandas.Categorical.from_codes：

print pd.Categorical.from_codes(iris.target, iris.target_names)

[setosa，setosa，setosa，setosa，setosa，...，virginica，virginica， virginica，virginica，virginica]长度：150类别（3，对象）： [setosa，versicolor，virginica]

让我们尝试一个不同的例子：

# I define new targets
target = np.array([123,123,54,123,123,54,2,54,2])
target = np.array([1,1,3,1,1,3,2,3,2])
target_names = np.array(['paglia','gioele','papa'])
#---
print "targets: ", np.unique(target)
print "targets: ", target.shape
print "target_names: ", np.unique(target_names)
print "target_names: ", target_names.shape

如果我再次尝试转换标签中的分类值：

print pd.Categorical.from_codes(target, target_names)

我收到错误消息：

C：\用户\ ianni \ Anaconda2 \ lib中\站点包\大熊猫\芯\ categorical.pyc   in from_codes（cls，代码，类别，有序）       459       460如果len（代码）和（codes.max（）＆gt; = len（类别）或codes.min（）＆lt; -1）：    - ＆GT; 461引发ValueError（＆＃34;代码需要介于-1和＆＃34之间;       462＆＃34; len（类别）-1＆＃34;）       463

ValueError：代码需要介于-1和len（类别）-1之间

你知道为什么吗？

Answer 1

你知道为什么吗？

如果您将仔细查看错误追溯：

In [128]: pd.Categorical.from_codes(target, target_names)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-128-c2b4f6ac2369> in <module>()
----> 1 pd.Categorical.from_codes(target, target_names)

~\Anaconda3_5.0\envs\py36\lib\site-packages\pandas\core\categorical.py in from_codes(cls, codes, categories, ordered)
    619
    620         if len(codes) and (codes.max() >= len(categories) or codes.min() < -1):
--> 621             raise ValueError("codes need to be between -1 and "
    622                              "len(categories)-1")
    623

ValueError: codes need to be between -1 and len(categories)-1

您将看到满足以下条件：

codes.max() >= len(categories)

在你的情况下：

In [133]: target.max() >= len(target_names)
Out[133]: True

换句话说，pd.Categorical.from_codes()期望codes为从0到len(categories) - 1

的连续数字

解决方法：

In [173]: target Out[173]: array([123, 123, 54, 123, 123, 54, 2, 54, 2])

helper dicts：

In [174]: mapping = dict(zip(np.unique(target), np.arange(len(target_names)))) In [175]: mapping Out[175]: {2: 0, 54: 1, 123: 2} In [176]: reverse_mapping = {v:k for k,v in mapping.items()} In [177]: reverse_mapping Out[177]: {0: 2, 1: 54, 2: 123}

建立分类系列：

In [178]: ser = pd.Categorical.from_codes(pd.Series(target).map(mapping), target_names) In [179]: ser Out[179]: [papa, papa, gioele, papa, papa, gioele, paglia, gioele, paglia] Categories (3, object): [paglia, gioele, papa]

反向映射：

In [180]: pd.Series(ser.codes).map(reverse_mapping) Out[180]: 0 123 1 123 2 54 3 123 4 123 5 54 6 2 7 54 8 2 dtype: int64

＆＃34;不正当＆＃34; pandas categorical.from_codes

1 个答案: