如何使用提供的方法规范化n个不同的数据集,该方法只能规范化一组数据

时间:2017-11-21 17:13:02

标签: python numpy machine-learning deep-learning

我有一组更改数据,我想使用running mean method对每组数据进行规范化,因为每个集合都有自己的均值和标准,我必须保留n个不同的模块来帮助我计算它。我不知道如何保持不同的Scaler,我刚刚被告知我不能多次导入它并将其重命名为不同的。但我不知道该怎么做。

如果需要,我的运行方法Scaler列出如下,

class Scaler(object):
""" Generate scale and offset based on running mean and stddev along axis=0

    offset = running mean
    scale = 1 / (stddev + 0.1) / 3 (i.e. 3x stddev = +/- 1.0)
"""

def __init__(self, obs_dim):
    """
    Args:
        obs_dim: dimension of axis=1
    """
    self.vars = np.zeros(obs_dim)
    self.means = np.zeros(obs_dim)
    self.m = 0
    self.n = 0
    self.first_pass = True

def update(self, x):
    """ Update running mean and variance (this is an exact method)
    Args:
        x: NumPy array, shape = (N, obs_dim)
    """
    if self.first_pass:
        self.means = np.mean(x, axis=0)
        self.vars = np.var(x, axis=0)
        self.m = x.shape[0]
        self.first_pass = False
    else:
        n = x.shape[0]
        new_data_var = np.var(x, axis=0)
        new_data_mean = np.mean(x, axis=0)
        new_data_mean_sq = np.square(new_data_mean)
        new_means = ((self.means * self.m) + (new_data_mean * n)) / (self.m + n)
        self.vars = (((self.m * (self.vars + np.square(self.means))) +
                      (n * (new_data_var + new_data_mean_sq))) / (self.m + n) -
                     np.square(new_means))
        self.vars = np.maximum(0.0, self.vars)  # occasionally goes negative, clip
        self.means = new_means
        self.m += n

def get(self):
    """ returns 2-tuple: (scale, offset) """
    return 1/(np.sqrt(self.vars) + 0.1)/3, self.means

感谢您的任何建议!

1 个答案:

答案 0 :(得分:2)

创建一个Scalar实例数组。如果每个数据集都有不同的obs_dim,则可以执行[Scalar(obs_dim) for obs_dim in obs_dims]。如果您有一个obs_dim,请使用[Scalar(obs_dim) for i in range(N)],其中N是数据集的数量。然后,您可以通过数组中的索引引用其中的每一个。