我正在使用this答案,以便在形状为(29421,11001)的矩阵(ndarray)中找到大于给定限制f的相关系数[即29,421行和11,001列]。
我已经按如下方式修改了代码(随机位选择要删除的两列中的一列;另外,与链接的答案对应的行后面有“###”):
问题:我得到的数千个相关系数大于1 ......根据我的理解,这不应该发生吗?
rand = random()
rows = dataset_normalized.shape[0] ###
print("Rows: " + str(dataset_normalized.shape[0]) + ", Columns: " + str(dataset_normalized.shape[1]))
ms = dataset_normalized.mean(axis=1)[(slice(None, None, None), None)] ###
datam = dataset_normalized - ms ###
datass = np.sqrt(scipy.stats.ss(datam, axis=1)) ###
correlations = {}
percent_rand_one = 0
percent_rand_zero = 0
for i in range(rows): ###
if(0 in datass[i:] or datass[i] == 0):
continue
else:
temp = np.dot(datam[i:], datam[i].T) ###
rs = temp / (datass[i:] * datass[i]) ###
for counter, corr in enumerate(rs):
if(corr > 1 or corr < -1):
# ERROR IS HERE: This is printing right now,
# a lot, so I'm not sure what's happening?
print("Correlation of " + str(corr) + " on " + str(i) + " and " + str(counter) + ".")
print("Something went wrong. Correlations calculated were either above 1 or below -1.")
elif(corr > f or corr < f):
rand_int = randint(1, 100)
if(rand_int > 50):
correlations[counter] = corr
percent_rand_one += 1
else:
correlations[i] = corr
percent_rand_zero += 1
有任何建议或想法吗?
答案 0 :(得分:0)
想出来......这是最奇怪的事情。我只需要翻转轴。
# Create correlations.
dataset_normalized_switched = np.swapaxes(dataset_normalized, 0, 1)
columns = dataset_normalized_switched.shape[0] ### This is the major change...
ms = dataset_normalized_switched.mean(axis=1)[(slice(None, None, None), None)]
datam = dataset_normalized_switched - ms
datass = np.sqrt(scipy.stats.ss(datam, axis=1))
correlations = {}
for i in range(columns):
temp = np.dot(datam[i:], datam[i].T)
with warnings.catch_warnings():
warnings.filterwarnings('ignore')
rs = temp / (datass[i:] * datass[i])
correlations[i] = [(index + i) for index, value in enumerate(rs) if (index != 0 and abs(value) < 1.1 and abs(value) > f)]