对两个有序数字序列进行评分,以确定它们之间的相似性

时间:2017-05-19 08:27:19

标签: algorithm computer-science pattern-recognition

我将如何评分两个数字序列,以便

5, 8, 28, 31 (differences of 3, 20 and 3) 6, 9, 26, 29 differences of 3, 17 and 3

被认为是相似的"足够"但是一系列的

8 11 31 34(3,20和3的差异,3,3,3,3的错误)

与允许太不相似了吗?

第二组数字的绝对误差为

1 1 2 2这个数字很低"足够"接受。

如果错误太高,我希望能够拒绝它。

为了给出一点背景知识,这些是时间指示以及事件何时到达计算机。第一个序列是预期到达时间,第二个序列是它们到达的实际时间。知道序列至少是正确的顺序,我需要能够通过调整某种值来获得与期望的相似性并接受或拒绝它。

如果它是一组数字的标准差,其中顺序并不重要,我可以根据自己的标准差拒绝第二组。

由于情况并非如此,我有了测量偏差和位置误差的想法。

位置误差不应该超过3,尽管这个数字不应该是整数 - 它需要是十进制的,因为数字是更逼真的浮点数,或者至少精确到6位小数。

它也需要同样有效,或者提供一种变体,其中可以公平地对更长的一系列数字进行评分。

在较长的数字系列中,位置误差不太可能超过3,因此位置误差仍然相当低。

这是我在每次x拟合y时使用Person的相关系数序列发现的部分解决方案。它使用方程式的形式来处理期望值。评论很好地描述了它。

function getPearsonsCorrelation(x, y)
{
    /**
     * Pearsons can be calculated in an alternative fashion as
     * p(x, y) = (E(xy) - E(x)*E(y))/sqrt[(E(x^2)-(E(x))^2)*(E(y^2)-(E(y))^2)]
     * where p(x, y) is the Pearson's correlation result, E is a function referring to the expected value
     * E(x) = var expectedValue = 0; for(var i = 0; i < x.length; i ++){ expectedValue += x[i]*p[i] }
     *  where p[i] is the probability of that variable occurring, here we substitute in 1 every time
     *  hence this simplifies to E(x) = sum of all x values
     * sqrt is the square root of the result in square brackets
     * ^2 means to the power of two, or rather just square that value
     **/
    var maxdelay = y.length - x.length; // we will calculate Pearson's correlation coefficient at every location x fits into y
    var xl = x.length
    var results = [];

    for(var d = 0; d <= maxdelay; d++){
        var xy = [];
        var x2 = [];
        var y2 = [];
        var _y = y.slice(d, d + x.length); // take just the segment of y at delay

        for(var i = 0; i < xl; i ++){
            xy.push(x[i] * _y[i]); // x*y array
            x2.push(x[i] * x[i]); // x squareds array
            y2.push(_y[i] * _y[i]); // y squareds array
        }

        var sum_x = 0;
        var sum_y = 0;
        var sum_xy = 0;
        var sum_x2 = 0;
        var sum_y2 = 0;

        for(var i = 0; i < xl; i ++){
            sum_x += x[i]; // expected value of x
            sum_y += _y[i]; // expected value of y
            sum_xy += xy[i]; // expected value of xy/n
            sum_x2 += x2[i]; // expected value of (x squared)/n
            sum_y2 += y2[i]; // expected value of (y squared)/n
        }

        var numerator = xl * sum_xy - sum_x * sum_y; // expected value of xy - (expected value of x * expected value of y)
        var denomLetSide = xl * sum_x2 - sum_x * sum_x; // expected value of (x squared) - (expected value of x) squared
        var denomRightSide = xl * sum_y2 - sum_y * sum_y; // expected value of (y squared) - (expected value of y) squared
        var denom = Math.sqrt(denomLetSide * denomRightSide);
        var pearsonsCorrelation = numerator / denom;

        results.push(pearsonsCorrelation);
    }
    return results;
}

0 个答案:

没有答案