困惑:Statsample中的相关性是“ - ”吗?

时间:2014-02-03 16:00:16

标签: ruby statistics

我对statsample很新,并且有一些基本问题。使用此示例数据:

[[1, 2, 3, 3],[2, 3, 3, 5],[4, 1, 3, 4]]

我创建一个名为ds的4x4 statsample数据集,并为每次调用获取以下输出:

        puts ds.summary

= Dataset 1
  Cases: 3
  Element:[actuals]
  == Vector 3
    n :3
    n valid:3
    factors:3
    mode: 3
    Distribution
+---+---+---------+
| 3 | 3 | 100.00% |
+---+---+---------+

  Element:[mids]
  == Vector 2
    n :3
    n valid:3
    factors:1,2,3
    mode: 2
    Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 3 | 1 | 33.33% |
+---+---+--------+

  Element:[predicteds]
  == Vector 4
    n :3
    n valid:3
    factors:3,4,5
    mode: 3
    Distribution
+---+---+--------+
| 3 | 1 | 33.33% |
| 4 | 1 | 33.33% |
| 5 | 1 | 33.33% |
+---+---+--------+

  Element:[prediction_error]
  == Vector 5
    n :3
    n valid:3
    factors:0,1,2
    mode: 0
    Distribution
+---+---+--------+
| 0 | 1 | 33.33% |
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
+---+---+--------+

  Element:[uids]
  == Vector 1
    n :3
    n valid:3
    factors:1,2,4
    mode: 1
    Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 4 | 1 | 33.33% |
+---+---+--------+

这似乎是合理的但是:

cm = ds.correlation_matrix
puts cm.summary

得到这个,这令人困惑:

Correlation Matrix
+------------------+---------+-------+------------+------------------+-------+
|                  | actuals | mids  | predicteds | prediction_error | uids  |
+------------------+---------+-------+------------+------------------+-------+
| actuals          | 1.000   | --    | --         | --               | --    |
| mids             | --      | 1.000 | --         | --               | --    |
| predicteds       | --      | --    | 1.000      | --               | --    |
| prediction_error | --      | --    | --         | 1.000            | --    |
| uids             | --      | --    | --         | --               | 1.000 |
+------------------+---------+-------+------------+------------------+-------+

1 个答案:

答案 0 :(得分:0)

您创建了一个带有名义向量的数据集,而不是标量向量。因此,非数字向量之间的相关性始终为0.