`A`和`B`

Question

我有一个矩阵A，包含事件及其发生的相关概率。例如

A= [1, 0.6; 5, 0.3; 4, 0.1]

事件1的概率为60％，事件5的概率为30％，事件4的概率为1％。

然后我有一系列类似的矩阵（事件概率）

B = [1,0.5; 3,0.4; 2,0.1]
C = [2,0.9; 4,0.1; 3,0]
D = [1,0.6; 5,0.3; 4,0.1]

我希望找到一个向量，显示A与其他每个矩阵的相似性。

SIM = [?,?,1]

前两个元素包含A和B之间以及A和C之间的相似性。第3个元素显示A和D之间的相似性（1因为它们是相同的）。

对于如何实现函数来进行矩阵之间的成对比较，您有什么建议吗？

非常感谢!!!

请同时考虑A的情况 A = [3,1;5,0;2,0]（等于A=[3,1;2,0;1,0]等等......）

Answer 1

`A`和`B`

之间的相似度计算功能

function SIM = SIMcalc(A,B)

%// Get joint unique events for A and B
unq_events = unique([A(:,1);B(:,1)]).'; %//'

%// Presence of events across joint unique events
event_tagA = bsxfun(@eq,A(:,1),unq_events);
event_tagB = bsxfun(@eq,B(:,1),unq_events);

%// Probabilities corresponding to each joint event
tagged_probA = sum(bsxfun(@times,A(:,2),event_tagA));
tagged_probB = sum(bsxfun(@times,B(:,2),event_tagB));

%// Set not-shared events as NaN
tagged_probA(~any(event_tagA))=nan;
tagged_probB(~any(event_tagB))=nan;

%// Get the similarity factors for each shared event. This is based on the
%// assumption that probabilities far apart must have a low shared
%// similarity factor. This factor would be later on used to scale the
%// individual probabilties for A and B.
sim_factor = 1-abs(tagged_probA-tagged_probB);
tagged_probA_sim_scaled = tagged_probA.*sim_factor;
tagged_probB_sim_scaled = tagged_probB.*sim_factor;

%// Get a concatenated matrix of scaled probabilities
tagged_probAB_sim_scaled = [tagged_probA_sim_scaled;tagged_probB_sim_scaled];

%// Get a hybrid array of probabilities based on the mean of probabilities
%// across A and B. Notice that for cases with identical probabilities, the
%// hybrid values would stay the same.
hybrid_probAB = mean(tagged_probAB_sim_scaled);

%// Get the sum of hybrid values. Notice that the sum would result in a
%// value of 1 when we have identical probabilities for identical events
SIM = nansum(hybrid_probAB);

return;

用于测试相似度计算的样本输入

%// Case 1 - First exammple from the question with D replacing B.
%// The SIM value must be 1 as mentioned in the question
disp('------------- Case 1 -----------------')
A= [1, 0.6; 5, 0.3; 4, 0.1]
B = [1,0.6; 5,0.3; 4,0.1]
SIM = SIMcalc(A,B)

%// Case 2 - Slight change to the first example with event 5 being
%// replaced by event 2 in B
%// The SIM value must be lesser than 1 as mentioned in the question
disp('------------- Case 2 -----------------')
A= [1, 0.6; 5, 0.3; 4, 0.1]
B = [1,0.6; 2,0.3; 4,0.1]
SIM = SIMcalc(A,B)

%// Case 3 - As presented in the comments by OP, that the SIM value must be 0
disp('------------- Case 3 -----------------')
A =[3,1;2,0;1,0]
B =[2,1;1,0;4,0]
SIM = SIMcalc(A,B)

%// Case 4 - As asked by me and replied by OP that SIM must be 1
disp('------------- Case 4 -----------------')
A =[3,1;2,0;1,0]
B =[3,1;2,0;1,0]
SIM = SIMcalc(A,B)

%// Case 5 - Random case added on my own.
%// As can be seen event 3 is common between A and B. Apart from event3,
%// only event 2 is common, but the probabilities arew far apart, so the
%// net SIM value must be slightly more than the identical probability of
%// event 3, i.e. slightly more than 0.55
 disp('------------- Case 5 -----------------')
A =[3,0.55;2,0.95;1,0]
B =[3,0.55;2,0.05;4,0.4]
SIM = SIMcalc(A,B)

结果

------------- Case 1 -----------------
A =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
B =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
SIM =
     1
------------- Case 2 -----------------
A =
    1.0000    0.6000
    5.0000    0.3000
    4.0000    0.1000
B =
    1.0000    0.6000
    2.0000    0.3000
    4.0000    0.1000
SIM =
    0.7000
------------- Case 3 -----------------
A =
     3     1
     2     0
     1     0
B =
     2     1
     1     0
     4     0
SIM =
     0
------------- Case 4 -----------------
A =
     3     1
     2     0
     1     0
B =
     3     1
     2     0
     1     0
SIM =
     1
------------- Case 5 -----------------
A =
    3.0000    0.5500
    2.0000    0.9500
    1.0000         0
B =
    3.0000    0.5500
    2.0000    0.0500
    4.0000    0.4000
SIM =
    0.6000

解释

让我们case 5详细解释决定衡量A和B之间相似性的最终标量值的基本原则。建议运行此案例的代码并观察变量的值。

<强>输入

A =
    3.0000    0.5500
    2.0000    0.9500
    1.0000         0
B =
    3.0000    0.5500
    2.0000    0.0500
    4.0000    0.4000

第1步

标记与其事件对应的A和B的概率，以便将不常见的事件设为NaNs。因此，我们会tagged_probA和tagged_probB，其价值如下所示 -

Event 1  Event 2  Event 3  Event 4
   0      0.95     0.55     NaN
  NaN     0.05     0.55     0.4

第2步

计算概率之间的差异，然后从1中减去结果。因此，接近1的数字意味着相似程度。例如，在event 3的此示例中，我们将结果设为1。这构成了在A和B之间找到相似性标准的基础，因为我们得到1相同的概率和较小的值，因为概率在{{1}的范围内相距很远。 }}。这存储在[0 1] -

中

sim_factor

第3步

使用sim_factor = NaN 0.1000 1.0000 NaN缩放A和B的标记概率。因此，我们根据sim_factor和A之间的相似性对标记的概率进行了缩放。这些是 -

第4步

由于最终值应该只是一个标量值，我们可以得到标记和缩放概率的平均值。对于相同概率情况，结果值将具有相同的值和单个概率，对于此示例中的tagged_probA_sim_scaled = NaN 0.0950 0.5500 NaN tagged_probB_sim_scaled = NaN 0.0050 0.5500 NaN。对于不相同的情况，它将根据event 3和A概率之间的不相似性缩小概率。这是B，如下所示 -

hybrid_probAB

第5步

对来自hybrid_probAB = NaN 0.0500 0.5500 NaN的非NaN元素求和，给出最终的标量相似度值，对于这个特定情况，该值小于hybrid_probAB。对于具有相同概率的案例，它会给我们一个完美的1。

结束语

查看1值，它们确实遵循预期趋势。所以，希望它可以解决你的其他情况。要计算SIM和其他数组之间的相似度值，请将它们作为输入运行。

Answer 2

好的，所以你选择一个接收两个矩阵并输出标量的函数，这样你就可以使用bsxfun：

similarity = @(x,y)(mean(mean(x./y)));
M = cat(3,B,C,D); %//Combine into a single 3D matrix
squeeze(sum(mean(bsxfun(similarity, A, M),2)))

请注意，我使用的相似度函数可能不适合您的数据，因为如果第二个矩阵中有Inf并且它不对称，它会返回0。关键是要说明它必须采用2D矩阵并输出标量。

Answer 3

您是否考虑过形成5个事件的完整概率直方图？让我们说：

Ah=[0.6 0 0 0.1 0.3];
Bh=[0.5 0.1 0.4 0 0];
Ch=[0 0.9 0 0.1 0];
Dh=[0.6 0 0 0.1 0.3];

然后你可以将它们作为向量进行比较，将它们连接在矩阵中并使用pdist：

m=[Ah; Bh; Ch; Dh];
sim=squareform(pdist(m,'cityblock'));

成对相似性比较matlab

3 个答案:

`A`和`B`

用于测试相似度计算的样本输入

结果

解释

结束语