给定未排序的其他约束,对数据框中的值进行标准化

时间:2017-02-20 09:51:22

标签: python pandas normalization

我有一个如下所示的数据框:

            counter leg_rate pose_rate component    approach      rmse
0   proc/stat-stime        d         d      test    Baseline  1.583097
1   proc/stat-stime        d         r      test  AEW - MTEN  0.516108
2   proc/stat-stime        d         d      test        ASDF  0.705861
3   proc/stat-stime        r         r      test        ASDF  0.345816
4   proc/stat-utime        d         r      test    Baseline  1.128632
5   proc/stat-stime        d         r      test    Baseline  1.579803
6   proc/stat-stime        r         r      test    Baseline  1.345895
7   proc/stat-utime        r         r      test  AEW - MTEN  0.187236
8   proc/stat-utime        d         d      test    Baseline  1.193776
9   proc/stat-stime        r         d      test        ASDF  0.014975
10  proc/stat-utime        r         r      test        ASDF  0.985493
11  proc/stat-utime        r         d      test  AEW - MTEN  0.897336
12  proc/stat-stime        r         d      test    Baseline  1.415103
13  proc/stat-utime        r         d      test    Baseline  1.724266
14  proc/stat-utime        r         r      test    Baseline  1.294654
15  proc/stat-utime        d         d      test  AEW - MTEN  0.263845
16  proc/stat-utime        r         d      test        ASDF  0.497368
17  proc/stat-stime        d         d      test  AEW - MTEN  0.143402
18  proc/stat-utime        d         r      test  AEW - MTEN  0.233437
19  proc/stat-stime        r         d      test  AEW - MTEN  0.431739
20  proc/stat-utime        d         r      test        ASDF  0.002475
21  proc/stat-stime        d         r      test        ASDF  0.331700
22  proc/stat-stime        r         r      test  AEW - MTEN  0.985123
23  proc/stat-utime        d         d      test        ASDF  0.464989

我想通过将rmse除以approach中名为Baseline的值来标准化rmse-norm。最后应该有一个新列rmse,其中包含相应的标准化值。所有其他列基本上都提供了在划分1 proc/stat-stime d r test AEW - MTEN 0.516108 时需要匹配的上下文。这意味着行

5   proc/stat-stime        d         r      test    Baseline  1.579803

需要除以与其他列匹配的行

Baseline

groupby方法总会有匹配的行。

我已尝试使用 public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log, IAsyncCollector<User> outputTable) { log.Info($"C# HTTP trigger function processed a request. RequestUri={req.RequestUri}"); var user = new User(); user.PartitionKey = "Users"; user.RowKey = DateTime.Now.Ticks.ToString(); user.UserId = "aaaa"; user.Country = "uk"; await outputTable.AddAsync(user); .... 并使用其他列的索引进行各种操作,但由于列的未知排序,我无法想出一些简洁的方法,即使用正确的顺序分配正确的值。 / p>

1 个答案:

答案 0 :(得分:2)

我认为你可以使用:

#filter all rows with Baseline to `MultiIndex` `Series`
cols = ['counter','leg_rate','pose_rate','component']
s = df[df.approach == 'Baseline'].set_index(cols)['rmse']
print (s)
counter          leg_rate  pose_rate  component
proc/stat-stime  d         d          test         1.583097
proc/stat-utime  d         r          test         1.128632
proc/stat-stime  d         r          test         1.579803
                 r         r          test         1.345895
proc/stat-utime  d         d          test         1.193776
proc/stat-stime  r         d          test         1.415103
proc/stat-utime  r         d          test         1.724266
                           r          test         1.294654
Name: rmse, dtype: float64
#sorting for matching, because set_index sort index
df = df.sort_values(cols)
#divide by s, output to numpy array for assign to rmse column
df['rmse'] = df.set_index(cols)['rmse'].div(s).values
#sort index to original unsorted df
print (df.sort_index())
            counter leg_rate pose_rate component    approach      rmse
0   proc/stat-stime        d         d      test    Baseline  1.000000
1   proc/stat-stime        d         r      test  AEW - MTEN  0.326691
2   proc/stat-stime        d         d      test        ASDF  0.445873
3   proc/stat-stime        r         r      test        ASDF  0.256941
4   proc/stat-utime        d         r      test    Baseline  1.000000
5   proc/stat-stime        d         r      test    Baseline  1.000000
6   proc/stat-stime        r         r      test    Baseline  1.000000
7   proc/stat-utime        r         r      test  AEW - MTEN  0.144622
8   proc/stat-utime        d         d      test    Baseline  1.000000
9   proc/stat-stime        r         d      test        ASDF  0.010582
10  proc/stat-utime        r         r      test        ASDF  0.761202
11  proc/stat-utime        r         d      test  AEW - MTEN  0.520416
12  proc/stat-stime        r         d      test    Baseline  1.000000
13  proc/stat-utime        r         d      test    Baseline  1.000000
14  proc/stat-utime        r         r      test    Baseline  1.000000
15  proc/stat-utime        d         d      test  AEW - MTEN  0.221017
16  proc/stat-utime        r         d      test        ASDF  0.288452
17  proc/stat-stime        d         d      test  AEW - MTEN  0.090583
18  proc/stat-utime        d         r      test  AEW - MTEN  0.206832
19  proc/stat-stime        r         d      test  AEW - MTEN  0.305094
20  proc/stat-utime        d         r      test        ASDF  0.002193
21  proc/stat-stime        d         r      test        ASDF  0.209963
22  proc/stat-stime        r         r      test  AEW - MTEN  0.731946
23  proc/stat-utime        d         d      test        ASDF  0.389511

groupby和自定义函数f的另一种解决方案:

def f(x):
    x.rmse = x['rmse'] / x.loc[x['approach'] == 'Baseline', 'rmse'].item()
    return x

df = df.groupby(['counter','leg_rate','pose_rate','component']).apply(f)
print (df)
            counter leg_rate pose_rate component    approach      rmse
0   proc/stat-stime        d         d      test    Baseline  1.000000
1   proc/stat-stime        d         r      test  AEW - MTEN  0.326691
2   proc/stat-stime        d         d      test        ASDF  0.445873
3   proc/stat-stime        r         r      test        ASDF  0.256941
4   proc/stat-utime        d         r      test    Baseline  1.000000
5   proc/stat-stime        d         r      test    Baseline  1.000000
6   proc/stat-stime        r         r      test    Baseline  1.000000
7   proc/stat-utime        r         r      test  AEW - MTEN  0.144622
8   proc/stat-utime        d         d      test    Baseline  1.000000
9   proc/stat-stime        r         d      test        ASDF  0.010582
10  proc/stat-utime        r         r      test        ASDF  0.761202
11  proc/stat-utime        r         d      test  AEW - MTEN  0.520416
12  proc/stat-stime        r         d      test    Baseline  1.000000
13  proc/stat-utime        r         d      test    Baseline  1.000000
14  proc/stat-utime        r         r      test    Baseline  1.000000
15  proc/stat-utime        d         d      test  AEW - MTEN  0.221017
16  proc/stat-utime        r         d      test        ASDF  0.288452
17  proc/stat-stime        d         d      test  AEW - MTEN  0.090583
18  proc/stat-utime        d         r      test  AEW - MTEN  0.206832
19  proc/stat-stime        r         d      test  AEW - MTEN  0.305094
20  proc/stat-utime        d         r      test        ASDF  0.002193
21  proc/stat-stime        d         r      test        ASDF  0.209963
22  proc/stat-stime        r         r      test  AEW - MTEN  0.731946
23  proc/stat-utime        d         d      test        ASDF  0.389511