我正在处理信号分类问题,并希望首先缩放数据集矩阵,但我的数据采用3D格式(批量,长度,通道)。
我尝试使用Scikit-learn标准缩放器:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
但是我收到了以下错误消息:
找到带有暗淡的数组3.预期StandardScaler< = 2
我认为一种解决方案是在多个2D矩阵中按每个通道分割矩阵,分别缩放它们然后再放回3D格式,但我想知道是否有更好的解决方案。
非常感谢你。
答案 0 :(得分:5)
只有3行代码...
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(-1, X_test.shape[-1])).reshape(X_test.shape)
答案 1 :(得分:4)
您必须为每个频道安装和存储缩放器
from sklearn.preprocessing import StandardScaler
scalers = {}
for i in range(X_train.shape[2]):
scalers[i] = StandardScaler()
X_train[:, i, :] = scalers[i].fit_transform(X_train[:, i, :])
for i in range(X_test.shape[2]):
X_test[:, i, :] = scalers[i].transform(X_test[:, i, :])
答案 2 :(得分:3)
如果您想像StandardScaler
那样以不同的方式缩放每个要素,则可以使用以下方法:
import numpy as np
from sklearn.base import TransformerMixin
from sklearn.preprocessing import StandardScaler
class NDStandardScaler(TransformerMixin):
def __init__(self, **kwargs):
self._scaler = StandardScaler(copy=True, **kwargs)
self._orig_shape = None
def fit(self, X, **kwargs):
X = np.array(X)
# Save the original shape to reshape the flattened X later
# back to its original shape
if len(X.shape) > 1:
self._orig_shape = X.shape[1:]
X = self._flatten(X)
self._scaler.fit(X, **kwargs)
return self
def transform(self, X, **kwargs):
X = np.array(X)
X = self._flatten(X)
X = self._scaler.transform(X, **kwargs)
X = self._reshape(X)
return X
def _flatten(self, X):
# Reshape X to <= 2 dimensions
if len(X.shape) > 2:
n_dims = np.prod(self._orig_shape)
X = X.reshape(-1, n_dims)
return X
def _reshape(self, X):
# Reshape X back to it's original shape
if len(X.shape) >= 2:
X = X.reshape(-1, *self._orig_shape)
return X
在将输入提供给sklearn的StandardScaler
之前,它只是将输入的功能展平。然后,将其重新塑形。用法与StandardScaler
相同:
data = [[[0, 1], [2, 3]], [[1, 5], [2, 9]]]
scaler = NDStandardScaler()
print(scaler.fit_transform(data))
打印
[[[-1. -1.]
[ 0. -1.]]
[[ 1. 1.]
[ 0. 1.]]]
自变量with_mean
和with_std
直接传递给StandardScaler
,因此可以正常工作。 copy=False
不起作用,因为重塑不会就位。对于二维输入,NDStandardScaler
的工作方式类似于StandardScaler
:
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = NDStandardScaler()
scaler.fit(data)
print(scaler.transform(data))
print(scaler.transform([[2, 2]]))
打印
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
[[3. 3.]]
就像在StandardScaler
的sklearn示例中一样。
答案 3 :(得分:1)
一种优雅的方式是使用类继承,如下所示:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
class MinMaxScaler3D(MinMaxScaler):
def fit_transform(self, X, y=None):
x = np.reshape(X, newshape=(X.shape[0]*X.shape[1], X.shape[2]))
return np.reshape(super().fit_transform(x, y=y), newshape=X.shape)
用法:
scaler = MinMaxScaler3D()
X = scaler.fit_transform(X)
答案 4 :(得分:0)
s0, s1, s2 = y_train.shape[0], y_train.shape[1], y_train.shape[2]
y_train = y_train.reshape(s0 * s1, s2)
y_train = minMaxScaler.fit_transform(y_train)
y_train = y_train.reshape(s0, s1, s2)
s0, s1, s2 = y_test.shape[0], y_test.shape[1], y_test.shape[2]
y_test = y_test.reshape(s0 * s1, s2)
y_test = minMaxScaler.transform(y_test)
y_test = y_test.reshape(s0, s1, s2)
就像这样重塑数据。对于零填充,请使用类似的内容:
s0, s1, s2 = x_train.shape[0], x_train.shape[1], x_train.shape[2]
x_train = x_train.reshape(s0 * s1, s2)
minMaxScaler.fit(x_train[0::s1])
x_train = minMaxScaler.transform(x_train)
x_train = x_train.reshape(s0, s1, s2)
s0, s1, s2 = x_test.shape[0], x_test.shape[1], x_test.shape[2]
x_test = x_test.reshape(s0 * s1, s2)
x_test = minMaxScaler.transform(x_test)
x_test = x_test.reshape(s0, s1, s2)
答案 5 :(得分:0)
如果要处理管道,可以使用此类
from sklearn.base import TransformerMixin,BaseEstimator
from sklearn.preprocessing import StandardScaler
class Scaler(BaseEstimator,TransformerMixin):
def __init__(self):
self.scaler = StandardScaler()
def fit(self,X,y=None):
self.scaler.fit(X.reshape(X.shape[0], -1))
return self
def transform(self,X):
return self.scaler.transform(X.reshape(X.shape[0], -1)).reshape(X.shape)
答案 6 :(得分:0)
我对形状为 (2500,512,642) -->(样本、时间步长、特征/空间位置)的时空数据使用了归一化方案。 以下代码可用于归一化及其逆。
def Normalize_data(data):
scaled_data = []
max_values = []
min_values = []
for N in range(data.shape[0]):
temp = []
t1 = []
t2 = []
for i in range(data.shape[1]):
max_val = np.max(data[N,i])
min_val = np.min(data[N,i])
norm = (data[N,i] - min_val)/(max_val - min_val)
temp.append(norm)
t1.append(max_val)
t2.append(min_val)
scaled_data.append(temp)
max_values.append(t1)
min_values.append(t2)
return (np.array(scaled_data), np.array(max_values), np.array(min_values))
def InverseNormalize_data(scaled_data, max_values, min_values):
res_data = []
for N in range(scaled_data.shape[0]):
temp = []
for i in range(scaled_data.shape[1]):
max_val = max_values[N,i]
min_val = min_values[N,i]
#print(max_val)
#print(min_val)
orig = (scaled_data[N,i] * (max_val - min_val)) + min_val
temp.append(orig)
res_data.append(temp)
return np.array(res_data)