我有一个pandas数据帧和一个返回向量的函数,我想将这些向量“添加”到我的框架中,如下所示:
for i in range(len(Training_Frame)):
Training_Frame.insert(i,'ONEcoded',onehot_processing(int(float(Training_Frame.iloc[i,-1])),numberOFclasses))
作为示例,假设o onehot_processing
函数每次都返回[0,0,0,1,0]
。 Training_Frame
第一行看起来像:
row0 [1,2,3,4,5]
row1 [2,3,4,5,3]
.............
如何添加返回的向量:
row0 [1,2,3,4,5,0,0,0,1,0]
row1 [2,3,4,5,3,0,0,0,1,0]
.............
我尝试运行循环时收到的错误消息是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-1a385098ae56> in <module>()
1 for i in range(len(Training_Frame)):
----> 2 Training_Frame.insert(i,'ONEcoded',onehot_processing(int(float(Training_Frame.iloc[i,-1])),numberOFclasses))
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\frame.py in insert(self, loc, column, value, allow_duplicates)
2507 """
2508 self._ensure_valid_index(value)
-> 2509 value = self._sanitize_column(column, value, broadcast=False)
2510 self._data.insert(loc, column, value,
2511 allow_duplicates=allow_duplicates)
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
2654
2655 # turn me into an ndarray
-> 2656 value = _sanitize_index(value, self.index, copy=False)
2657 if not isinstance(value, (np.ndarray, Index)):
2658 if isinstance(value, list) and len(value) > 0:
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
2798
2799 if len(data) != len(index):
-> 2800 raise ValueError('Length of values does not match length of ' 'index')
2801
2802 if isinstance(data, PeriodIndex):
ValueError: Length of values does not match length of index
更新
#Function to generate a onehot coded vector based on
#number of class & the given class ID
def onehot_processing(classID, numberOFclasses):
one_hot_vector=np.zeros(numberOFclasses)
one_hot_vector[classID-1] = 1
return one_hot_vector
ClassID保存在数据框的最后一列
中答案 0 :(得分:1)
对我而言,将向量添加到list
列:
a = [0,0,0,1,0]
df = pd.DataFrame({'a':[[1,2,3,4,5],[2,3,4,5,3]]}, index=['row0','row1'])
print (df)
a
row0 [1, 2, 3, 4, 5]
row1 [2, 3, 4, 5, 3]
df['a'] = df['a'] + a
print (df)
a
row0 [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1 [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]
因此请将Series.apply
与自定义函数一起使用:
def onehot_processing(x):
#sample only
return [0,0,0,1,0]
df['a'] = df['a'].apply(lambda x : x + onehot_processing(x))
print (df)
a
row0 [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1 [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]
DataFrame.apply
的解决方案:
df['b'] = df.apply(lambda x : x['a'] + onehot_processing(x['a']), axis=1)
print (df)
a b
row0 [1, 2, 3, 4, 5] [1, 2, 3, 4, 5, 0, 0, 0, 1, 0]
row1 [2, 3, 4, 5, 3] [2, 3, 4, 5, 3, 0, 0, 0, 1, 0]
你也可以改变一下功能:
def onehot_processing(classID, numberOFclasses):
one_hot_vector=np.zeros(numberOFclasses)
one_hot_vector[np.array(classID)-1] = 1
return (one_hot_vector.astype(int).tolist())
df['b'] = df.apply(lambda x : onehot_processing(x['a'], 8), axis=1)
print (df)
a b
row0 [1, 2, 3, 4, 5] [1, 1, 1, 1, 1, 0, 0, 0]
row1 [2, 3, 4, 5, 3] [0, 1, 1, 1, 1, 0, 0, 0]
引擎工作代码
Training_Frame['b'] = Training_Frame.apply(lambda x : onehot_processing(int(float(x['Onehot_column'])), numberOFclasses), axis=1)