我想对单词包进行多重共线性检查,以便进行逻辑回归。我必须添加噪声强>到矩阵即;来自 N(0,0.1)(以添加噪音)。我想在添加噪声之前和添加噪声之后检查权重。如果权重相差很大,那么我将知道存在多重共线性。我将文本转换为矩阵
count_vect = CountVectorizer() #in scikit-learn
final_counts = count_vect.fit_transform(data['CleanedText'].values)
standardized_data = StandardScaler(with_mean=False).fit_transform(final_counts)
standardized_data(稀疏矩阵)的形状如下
(0, 232) 5.28663039106
(0, 1026) 2.09754160944
(0, 4351) 47.1484208356
(0, 4894) 3.62576585703
(0, 6326) 17.496202036
(0, 7585) 12.2994564729
(0, 9033) 55.0542695865
(0, 9480) 5.60252663694
(0, 9489) 34.3093270041
谁能告诉我如何向矩阵添加噪声并获得权重?