Question

我有一个pandas数据帧和一个返回向量的函数，我想将这些向量“添加”到我的框架中，如下所示：

for i in range(len(Training_Frame)):
    Training_Frame.insert(i,'ONEcoded',onehot_processing(int(float(Training_Frame.iloc[i,-1])),numberOFclasses))

作为示例，假设o onehot_processing函数每次都返回[0,0,0,1,0]。 Training_Frame第一行看起来像：

row0 [1,2,3,4,5]
row1 [2,3,4,5,3]
.............

如何添加返回的向量：

row0 [1,2,3,4,5,0,0,0,1,0]
row1 [2,3,4,5,3,0,0,0,1,0]
.............

我尝试运行循环时收到的错误消息是：

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-1a385098ae56> in <module>()
      1 for i in range(len(Training_Frame)):
----> 2     Training_Frame.insert(i,'ONEcoded',onehot_processing(int(float(Training_Frame.iloc[i,-1])),numberOFclasses))

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\frame.py in insert(self, loc, column, value, allow_duplicates)
   2507         """
   2508         self._ensure_valid_index(value)
-> 2509         value = self._sanitize_column(column, value, broadcast=False)
   2510         self._data.insert(loc, column, value,
   2511                           allow_duplicates=allow_duplicates)

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   2654 
   2655             # turn me into an ndarray
-> 2656             value = _sanitize_index(value, self.index, copy=False)
   2657             if not isinstance(value, (np.ndarray, Index)):
   2658                 if isinstance(value, list) and len(value) > 0:

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
   2798 
   2799     if len(data) != len(index):
-> 2800         raise ValueError('Length of values does not match length of ' 'index')
   2801 
   2802     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

更新

#Function to generate a onehot coded vector based on
#number of class & the given class ID
def onehot_processing(classID, numberOFclasses):
    one_hot_vector=np.zeros(numberOFclasses)
    one_hot_vector[classID-1] = 1
    return one_hot_vector

ClassID保存在数据框的最后一列

中

Answer 1

对我而言，将向量添加到list列：

a = [0,0,0,1,0]

df = pd.DataFrame({'a':[[1,2,3,4,5],[2,3,4,5,3]]}, index=['row0','row1'])
print (df)
                    a
row0  [1, 2, 3, 4, 5]
row1  [2, 3, 4, 5, 3]

df['a'] = df['a'] + a
print (df)
                                   a
row0  [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1  [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]

因此请将Series.apply与自定义函数一起使用：

def onehot_processing(x):
    #sample only
    return [0,0,0,1,0]

df['a'] = df['a'].apply(lambda x : x + onehot_processing(x))
print (df)
                                   a
row0  [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1  [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]

DataFrame.apply的解决方案：

df['b'] = df.apply(lambda x : x['a'] + onehot_processing(x['a']), axis=1)
print (df)
                    a                               b
row0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1  [2, 3, 4, 5, 3]  [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]

你也可以改变一下功能：

def onehot_processing(classID, numberOFclasses):
    one_hot_vector=np.zeros(numberOFclasses)
    one_hot_vector[np.array(classID)-1] = 1
    return (one_hot_vector.astype(int).tolist())


df['b'] = df.apply(lambda x : onehot_processing(x['a'], 8), axis=1)
print (df)
                    a                         b
row0  [1, 2, 3, 4, 5]  [1, 1, 1, 1, 1, 0, 0, 0]
row1  [2, 3, 4, 5, 3]  [0, 1, 1, 1, 1, 0, 0, 0]

引擎工作代码

  Training_Frame['b'] = Training_Frame.apply(lambda x : onehot_processing(int(float(x['Onehot_column'])), numberOFclasses), axis=1)

向pandas中的每一行添加一个向量

1 个答案: