分箱值并使用分箱标签来引用另一个数据帧的索引

时间:2014-02-25 12:33:06

标签: python pandas scipy apply binning

我正在努力完成这项任务:
到目前为止我做了什么:我有8760个值,我按照一定的时间间隔对它们进行了分类。间隔的数量是10.然后我对值进行分组。

问题:现在我必须将此数据帧(df1)的每个“级别”引用到(df2)中的另一个数据帧索引,以逐行执行某个计算。(即)10间隔指向另一个数据帧的10个索引。

bins=[-1,0,1,1.065,1.230,1.500,1.950,2.800,4.500,6.200,13.10]
arr=pd.cut(df1,bins)
grouped=df1.groupby(arr)
pd.value_counts(arr)


Out[58]:
(-1, 0]           4015  
(0, 1]            1948  
(1.95, 2.8]       646  
(2.8, 4.5]        542  
(1.5, 1.95]       539  
(1.23, 1.5]       427  
(1.065, 1.23]     337  
(4.5, 6.2]        127  
(1, 1.065]        125  
(6.2, 13.1]        54  
dtype: int64  

现在我必须使用它来将此引用到(df2)

的索引
data={'f11':['0','0','-0.008','0.13','0.33','0.568','0.873','1.132','1.06','0.678'],'f12':['0','0','0.588','0.683','0.487','0.187','-0.392','-1.237','-1.6','-0.327'],'f13':['0','0','-0.062','-0.151','-0.221','-0.295','-0.362','-0.412','-0.359','-0.25'],'f21':['0','0','-0.06','-0.019','0.055','0.109','0.226','0.288','0.264','0.156'],'f22':['0','0','0.072','0.066','-0.064','-0.152','-0.462','-0.823','-1.127','-1.377'],'f23':['0','0','-0.022','-0.029','-0.026','-0.014','0.001','0.056','0.131','0.251']}  

df2=DataFrame(data,columns=['f11','f12','f13','f21','f22','f23'],index=['1','2','3','4','5','6','7','8','9','10'])

需要解决方案:(-1,0)引用索引'1',(0,1)引用'2'依此类推。这是执行(f11 + f12 +(f21) * f22 * f23))根据推荐的指数逐行计算所有8760个值。

1 个答案:

答案 0 :(得分:0)

  1. 将类别映射到整数索引

    mapping_dict = dict(zip(arr.unique(),np.arange(arr.size)))

    category_as_int = pd.Series(arr).map(mapping_dict)

  2. 将category_as_int作为列添加到df1

    df1 = pd.DataFrame(df1)#fifs df1 to DataFrame if a a Series

    df1 ['key'] = category_as_int

  3. 合并df1和df2(注意df2索引的变化)

    df2 = DataFrame(data,columns = ['f11','f12','f13','f21','f22','f23'],index = np.arange(len(data))

    df = pd.merge(df1,df2,left_on ='key',right_index = True,how ='left')

  4. 对所有8K +行执行操作

    df.f11 + df.f12 +(df.f21 * df.f22 * df.f23)