我对statsample很新,并且有一些基本问题。使用此示例数据:
[[1, 2, 3, 3],[2, 3, 3, 5],[4, 1, 3, 4]]
我创建一个名为ds的4x4 statsample数据集,并为每次调用获取以下输出:
puts ds.summary
得
= Dataset 1
Cases: 3
Element:[actuals]
== Vector 3
n :3
n valid:3
factors:3
mode: 3
Distribution
+---+---+---------+
| 3 | 3 | 100.00% |
+---+---+---------+
Element:[mids]
== Vector 2
n :3
n valid:3
factors:1,2,3
mode: 2
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 3 | 1 | 33.33% |
+---+---+--------+
Element:[predicteds]
== Vector 4
n :3
n valid:3
factors:3,4,5
mode: 3
Distribution
+---+---+--------+
| 3 | 1 | 33.33% |
| 4 | 1 | 33.33% |
| 5 | 1 | 33.33% |
+---+---+--------+
Element:[prediction_error]
== Vector 5
n :3
n valid:3
factors:0,1,2
mode: 0
Distribution
+---+---+--------+
| 0 | 1 | 33.33% |
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
+---+---+--------+
Element:[uids]
== Vector 1
n :3
n valid:3
factors:1,2,4
mode: 1
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 4 | 1 | 33.33% |
+---+---+--------+
这似乎是合理的但是:
cm = ds.correlation_matrix
puts cm.summary
得到这个,这令人困惑:
Correlation Matrix
+------------------+---------+-------+------------+------------------+-------+
| | actuals | mids | predicteds | prediction_error | uids |
+------------------+---------+-------+------------+------------------+-------+
| actuals | 1.000 | -- | -- | -- | -- |
| mids | -- | 1.000 | -- | -- | -- |
| predicteds | -- | -- | 1.000 | -- | -- |
| prediction_error | -- | -- | -- | 1.000 | -- |
| uids | -- | -- | -- | -- | 1.000 |
+------------------+---------+-------+------------+------------------+-------+
答案 0 :(得分:0)
您创建了一个带有名义向量的数据集,而不是标量向量。因此,非数字向量之间的相关性始终为0.