将一个额外的列添加到依赖于另一列的pandas数据框中

时间:2017-09-12 10:11:31

标签: python pandas

我从Iris dataset制作了一个pandas DataFrame,我想添加一个额外的列调用SpecieID。这意味着Iris-setosa的ID为0,Iris-versicolor,1和Iris-virginica,2。

我尝试了代码:

def create_specie_id():
    if iris["Species"] == "Iris-setosa":
        ID = 0
    elif iris["Species"] == "Iris-versicolor":
        ID = 1
    elif iris["Species"] == "Iris-virginica":
        ID = 2
    return ID

iris = iris.assign(SpecieID = lambda x: create_specie_id())

print (iris)

但我收到了以下错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-2abd69ffef4b> in <module>()
     10     return ID
     11 
---> 12 iris = iris.assign(SpecieID = lambda x: create_specie_id())
     13 
     14 print (iris)

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in assign(self, **kwargs)
   2495         results = {}
   2496         for k, v in kwargs.items():
-> 2497             results[k] = com._apply_if_callable(v, data)
   2498 
   2499         # ... and then assign

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\common.py in _apply_if_callable(maybe_callable, obj, **kwargs)
    439     """
    440     if callable(maybe_callable):
--> 441         return maybe_callable(obj, **kwargs)
    442     return maybe_callable
    443 

<ipython-input-58-2abd69ffef4b> in <lambda>(x)
     10     return ID
     11 
---> 12 iris = iris.assign(SpecieID = lambda x: create_specie_id())
     13 
     14 print (iris)

<ipython-input-58-2abd69ffef4b> in create_specie_id()
      2 
      3 def create_specie_id():
----> 4     if iris["Species"] == "Iris-setosa":
      5         ID = 0
      6     elif iris["Species"] == "Iris-versicolor":

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

如何创建包含SpecieID的列?

1 个答案:

答案 0 :(得分:1)

您可以使用numpy.select

iris=pd.DataFrame({'Species':['Iris-setosa','Iris-versicolor','Iris-virginica', 'another']})

m1 =  iris["Species"] == "Iris-setosa"
m2 =  iris["Species"] == "Iris-versicolor"
m3 =  iris["Species"] == "Iris-virginica"

iris['ID'] = np.select([m1,m2,m3], [0,1,2], default=-1)

print (iris)
           Species  ID
0      Iris-setosa   0
1  Iris-versicolor   1
2   Iris-virginica   2
3          another  -1

另一种解决方案是dict使用map - 如果值未匹配则获取NaN,因此fillna添加了astype

d = { "Iris-setosa" : 0, "Iris-versicolor":1,  "Iris-virginica":2}
iris['ID'] = iris['Species'].map(d).fillna(-1).astype(int)

print (iris)
           Species  ID
0      Iris-setosa   0
1  Iris-versicolor   1
2   Iris-virginica   2
3          another  -1