Question

我有一个Pandas DataFrame，如下所示，在Category：

中有大约1k个不同的值

TextView

我的期望：

将数值列“值”分成最小值和最大值之间的N个波段：对于上述情况，让Nbands = 10，它们是0.1-10,10-20,20-30 ...... ，90-100。
根据Category_ValueBands组合创建新列并总结值：

那是：

OnClickListener

对于数字列“值”和分类列“类别”的值，使用N个波段进行此操作的最佳方法是什么？

Answer 1

您可以在scikit-learn中使用oneHotEncoder。

但如果你想直接这样做，也许就是这样......

将数据加载到数据框

import numpy as np
import pandas as pd
x = pd.read_csv('testData.csv')

创建一个包含所需值的新列表...稍后您会将这些列添加到新列名称中，但现在它们的值是......＆＃39; a_0＆＃39;，＆＃39; a_1＆＃39;等

newCol_1 = x.Category.values
newCol_2 = (x.Value / 10).astype(int).astype(str).values
x['newCol'] = newCol_1 + '_' + newCol_2

创建一个包含值的总和的新列。

newVals = x.ix[:, ['newCol', 'Value']].groupby('newCol').agg(np.sum)       # counts
newVals.columns = ['newVals']                                              # change column names
x = pd.merge(x, newVals, how='left', left_on='newCol', right_index=True)   # merge with df


x.ix[:,['newCol', 'newVals']]
Out[54]: 
  newCol  newVals
0    a_0      0.6
1    b_0      1.0
2   c_10    100.0
3    d_2     20.0
4    a_0      0.6

传播＆＃39; newCol＆＃39;每个类别分为一列...

for col in np.unique(x.newCol):
    x[col] = 0.0
    idx = (x.newCol == col)
    x.ix[idx, col] = x.newVals[idx]


x
Out[56]: 
  Category  Value  Count newCol  newVals  a_0  b_0  c_10  d_2
0        a    0.1      2    a_0      0.6  0.6    0     0    0
1        b    1.0      3    b_0      1.0  0.0    1     0    0
2        c  100.0      1   c_10    100.0  0.0    0   100    0
3        d   20.0      4    d_2     20.0  0.0    0     0   20
4        a    0.5      5    a_0      0.6  0.6    0     0    0

基于Python中现有Pandas数据列的值创建新列的最佳方法是什么？

1 个答案: