我正在寻找时间序列的SSE功能。
当使用Kmedoid函数时,例如来自pam
包的cluster
,返回类没有包含来自聚类的SSE值的索引。话虽这么说,我做了一个简单的函数,使用距离矩阵(已经是pam
函数的参数),但它并不那么快,考虑到一个大约4亿元素的巨大矩阵,是一个输入。
出于这个原因,是否有任何已经具有SSE功能的包,对于这种情况?或者,我该如何改进我的功能?输入的示例可以是:
( ts dataset or the d.matrix ,the clustering classes )
我有谷歌:SSE R功能在找到包时没有成功。
所以,到目前为止我所做的功能就是这个:
generate.sse <- function( dmatrix, groups )
{
sum = 0
for(i in 1:nrow(dmatrix))
{
sum = sum + sum(dmatrix[ i , groups == groups[i] ])
}
return (sum)
}
正如我之前所说,它不是最佳的,因此它很慢。主要是因为它的用法和总和,因为它是C之类的。
对于一个可重现的例子,我将留下一个时间序列的例子(dmatrix可以用cor
函数获得)和一个可能从pam
函数返回的聚类组。两者都是使用dput
时间序列:
structure(c(0, 311, 0, 739, 277, 841, 0, 548, 177, 36, 0, 0,
0, 0, 1268, 30, 547, 0, 95, 0, 2, 0, 0, 0, 2344, 0, 8, 574, 0,
0, 0, 589, 0, 0, 40, 3, 0, 2, 0, 0, 28, 704, 0, 46, 0, 1, 0,
52, 1374, 0, 2298, 0, 2, 0, 27, 827, 12, 0, 19, 0, 10, 815, 0,
106, 0, 0, 0, 852, 1113, 0, 79, 12, 0, 0, 0, 914, 0, 0, 0, 69,
0, 0, 78, 0, 51, 0, 1841, 0, 314, 1047, 0, 0, 0, 1522, 2251,
0, 55, 0, 0, 0, 20, 1338, 762, 1462, 972, 1877, 3, 1717, 743,
4, 0, 29, 0, 658, 5436, 65, 981, 11, 1866, 20, 147, 29, 134,
23, 7241, 144, 0, 553, 152, 827, 300, 80, 252, 58, 159, 274,
2687, 69, 4198, 115, 9, 5046, 0, 4, 60, 122, 20, 309, 58, 377,
7045, 0, 996, 387, 311, 4911, 8, 22, 124, 119, 324, 2260, 22,
1062, 0, 10, 28, 2379, 5545, 370, 222, 178, 208, 189, 24, 2917,
851, 1012, 43, 128, 1756, 484, 0, 1275, 144, 630, 7120, 698,
0, 3, 0, 787, 24, 7305, 6495, 584, 476, 226, 332, 15, 18, 1087,
1022, 1946, 1029, 2120, 16, 1623, 678, 29, 0, 31, 0, 604, 5590,
50, 950, 0, 1758, 17, 141, 27, 1, 22, 7846, 181, 0, 1637, 41,
942, 332, 51, 211, 38, 138, 261, 2379, 69, 4297, 113, 36, 4836,
3, 231, 73, 84, 24, 357, 56, 367, 7920, 2, 1114, 250, 228, 4509,
6, 19, 181, 68, 759, 2048, 21, 959, 0, 12, 38, 2388, 5501, 385,
264, 272, 145, 90, 94, 2974, 948, 1260, 42, 204, 1688, 297, 0,
1145, 221, 24, 8042, 442, 0, 0, 0, 837, 17, 6015, 6649, 518,
534, 523, 314, 26, 18, 1420, 1517, 1570, 986, 2665, 17, 1587,
683, 23, 16, 23, 0, 500, 5814, 10, 992, 0, 2185, 23, 120, 19,
17, 28, 4043, 102, 0, 1872, 110, 852, 256, 66, 265, 40, 132,
307, 2575, 69, 1, 63, 44, 5734, 0, 120, 124, 27, 17, 364, 66,
545, 8229, 1, 970, 212, 164, 4683, 11, 22, 68, 49, 414, 1028,
23, 1060, 0, 18, 21, 2234, 4775, 387, 242, 212, 158, 118, 166,
1118, 782, 1318, 122, 422, 2153, 341, 61, 1094, 164, 566, 7264,
515, 0, 0, 0, 811, 21, 5946, 6675, 528, 700, 447, 328, 25, 10,
1319, 1530, 1564, 966, 2085, 13, 1296, 695, 45, 17, 26, 76, 504,
6335, 9, 928, 0, 2248, 26, 165, 0, 40, 15, 9348, 140, 0, 1904,
183, 0, 115, 13, 254, 0, 180, 253, 3524, 64, 4043, 97, 18, 4505,
0, 235, 69, 42, 18, 647, 516, 477, 8544, 3, 894, 256, 299, 4873,
7, 20, 117, 31, 321, 1897, 23, 1606, 0, 16, 32, 2451, 5031, 419,
258, 149, 146, 71, 155, 2860, 622, 0, 0, 142, 1677, 459, 0, 943,
153, 20, 6869, 603, 0, 0, 21, 0, 21, 5545, 7052, 583, 491, 545,
359, 20, 10, 1257, 1571, 1542, 1011, 2300, 17, 1646, 58, 53,
0, 19, 0, 565, 5478, 23, 638, 0, 1941, 16, 119, 11, 1, 20, 7018,
119, 0, 1795, 89, 0, 299, 57, 200, 71, 149, 262, 2326, 45, 4942,
0, 11, 4994, 0, 91, 75, 53, 15, 252, 448, 323, 6923, 1, 1101,
314, 338, 4656, 12, 15, 102, 43, 283, 2496, 21, 1282, 24, 16,
24, 2245, 7817, 365, 222, 255, 155, 66, 96, 2866, 738, 2783,
198, 256, 1521, 311, 0, 925, 203, 465, 6728, 483, 0, 0, 0, 692,
6, 5188, 6343, 598, 507, 271, 297, 18, 0, 679, 840, 1123, 450,
1597, 32, 856, 199, 43, 0, 0, 0, 601, 3544, 0, 553, 0, 1010,
1, 154, 0, 20, 0, 3672, 0, 20, 1140, 54, 540, 0, 18, 166, 0,
188, 43, 1685, 29, 9, 0, 49, 1777, 0, 23, 18, 15, 0, 56, 80,
0, 3439, 0, 701, 107, 161, 1756, 5, 0, 0, 167, 30, 1426, 0, 309,
0, 12, 0, 1470, 1829, 120, 105, 34, 0, 23, 73, 1376, 0, 0, 0,
274, 748, 86, 164, 289, 83, 0, 3726, 318, 15, 0, 17, 165, 0,
2819, 2882, 39, 153, 170, 0, 0), .Dim = c(100L, 7L))
组:
c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L)
答案 0 :(得分:0)
PAM不使用也不优化SSE。
相反,k-medoids的目标是最小化总偏差 TD。我假设pam函数在返回的对象中包含了这个分数。
R本身慢。如果您有任何更大的计算,请避免使用R代码,而是重写代码以尽可能使用C和FORTRAN。也许你可以使用一些“矢量化”操作,或者你可以在C中编写自己的库。考虑R只是一个“驱动”语言,就像底层库的“用户界面”。