Question

您好，希望能得到一些帮助，我有一个这样的Dataframe df：

label   cell_name hour  kpi1    kpi2
train   c1  1   10  20
train   c1  2   10  44
train   c1  3   11  33
train   c1  4   5   1
train   c1  5   2   6
test    c1  1   78  66
test    c1  2   45  2
test    c1  3   23  12
test    c1  4   65  45
test    c1  5   86  76

我的意图是有条件地从测试集的kpi1，kpi2列中减去say（50），然后用训练集（groupby cell和hour）将相同的列相除，然后将其附加到原始数据帧，以便新列看起来像;

label   cell_name hour  kpi1    kpi2    kpi1_index  kpi2_index
train   c1  1   10  20      
train   c1  2   10  44      
train   c1  3   11  33      
train   c1  4   5   1       
train   c1  5   2   6       
test    c1  1   78  66   2.8         0.8
test    c1  2   45  2    -0.5       -1.09
test    c1  3   23  12  -2.45       -1.15
test    c1  4   65  45    3          -5
test    c1  5   86  76    18        4.33

我尝试了以下代码：

import pandas as pd
import os
rr=os.getcwd()
df=pd.read_excel(rr+'\\KPI_test_train.xlsx')
print(df.columns)


def f(x,y):
    return ((x-50)/y)     
df_grouped = df.groupby(['label'])
[dtest,dtrain]=[y for x,y in df_grouped]
dtest=dtest.groupby(['label','cell_name','hour']).sum()
dtrain=dtrain.groupby(['label','cell_name','hour']).sum()

for i in dtest.columns:
    dtest[i+'_index']=f(dtest[i],dtrain[i])

函数f返回所有行的NaN值。但是考虑到这些事情上熊猫通常很漂亮，这有点令人讨厌。内置的方法是什么？

Answer 1

我认为这里最好分别处理每个DataFrame-因此，首先使用DataFrame.pop进行条件过滤以提取列，按列创建MultiIndex进行对齐并为所有值应用公式。然后将DataFrame.add_suffix和DataFrame.join添加到test DataFrame中，如果需要使用一个DataFrame，则最后使用concat：

lab = df.pop('label')
dtest = df[lab.eq('train')].set_index(['cell_name','hour'])
dtrain = df[lab.eq('test')].set_index(['cell_name','hour'])

df = dtest.join(((dtrain - 50) / dtest).add_suffix('_index'))

df = (pd.concat([dtrain, df], keys=('train','test'), sort=False)
        .reset_index()
        .rename(columns={'level_0':'label'}))
print (df)
   label cell_name  hour  kpi1  kpi2  kpi1_index  kpi2_index
0  train        c1     1    78    66         NaN         NaN
1  train        c1     2    45     2         NaN         NaN
2  train        c1     3    23    12         NaN         NaN
3  train        c1     4    65    45         NaN         NaN
4  train        c1     5    86    76         NaN         NaN
5   test        c1     1    10    20    2.800000    0.800000
6   test        c1     2    10    44   -0.500000   -1.090909
7   test        c1     3    11    33   -2.454545   -1.151515
8   test        c1     4     5     1    3.000000   -5.000000
9   test        c1     5     2     6   18.000000    4.333333

条件差异，除以数据框熊猫的同一列

1 个答案: