我正在努力完成这项任务:
到目前为止我做了什么:我有8760个值,我按照一定的时间间隔对它们进行了分类。间隔的数量是10.然后我对值进行分组。
问题:现在我必须将此数据帧(df1)的每个“级别”引用到(df2)中的另一个数据帧索引,以逐行执行某个计算。(即)10间隔指向另一个数据帧的10个索引。
bins=[-1,0,1,1.065,1.230,1.500,1.950,2.800,4.500,6.200,13.10]
arr=pd.cut(df1,bins)
grouped=df1.groupby(arr)
pd.value_counts(arr)
Out[58]:
(-1, 0] 4015
(0, 1] 1948
(1.95, 2.8] 646
(2.8, 4.5] 542
(1.5, 1.95] 539
(1.23, 1.5] 427
(1.065, 1.23] 337
(4.5, 6.2] 127
(1, 1.065] 125
(6.2, 13.1] 54
dtype: int64
现在我必须使用它来将此引用到(df2)
的索引data={'f11':['0','0','-0.008','0.13','0.33','0.568','0.873','1.132','1.06','0.678'],'f12':['0','0','0.588','0.683','0.487','0.187','-0.392','-1.237','-1.6','-0.327'],'f13':['0','0','-0.062','-0.151','-0.221','-0.295','-0.362','-0.412','-0.359','-0.25'],'f21':['0','0','-0.06','-0.019','0.055','0.109','0.226','0.288','0.264','0.156'],'f22':['0','0','0.072','0.066','-0.064','-0.152','-0.462','-0.823','-1.127','-1.377'],'f23':['0','0','-0.022','-0.029','-0.026','-0.014','0.001','0.056','0.131','0.251']}
df2=DataFrame(data,columns=['f11','f12','f13','f21','f22','f23'],index=['1','2','3','4','5','6','7','8','9','10'])
需要解决方案:(-1,0)引用索引'1',(0,1)引用'2'依此类推。这是执行(f11 + f12 +(f21) * f22 * f23))根据推荐的指数逐行计算所有8760个值。
答案 0 :(得分:0)
将类别映射到整数索引
mapping_dict = dict(zip(arr.unique(),np.arange(arr.size)))
category_as_int = pd.Series(arr).map(mapping_dict)
将category_as_int作为列添加到df1
df1 = pd.DataFrame(df1)#fifs df1 to DataFrame if a a Series
df1 ['key'] = category_as_int
合并df1和df2(注意df2索引的变化)
df2 = DataFrame(data,columns = ['f11','f12','f13','f21','f22','f23'],index = np.arange(len(data))
df = pd.merge(df1,df2,left_on ='key',right_index = True,how ='left')
对所有8K +行执行操作
df.f11 + df.f12 +(df.f21 * df.f22 * df.f23)