我有两个向量x
和w
。 w
是一个权重的数字向量,长度与x
相同,给出了x
元素的权重。
我想给出向量x
中元素的加权平均值,它们的差异很小(例如1e-1或1e-2),以减少向量x
的长度。例如,这些载体如下:
w =c(1.459032e-01, 1.535375e-04, 1.829973e-04, 1.057226e-01, 2.833444e-04,
2.559756e-04, 6.440060e-03, 6.294748e-02, 5.984383e-04, 2.772186e-04,
4.869825e-05, 8.212092e-04, 1.233256e-01, 2.558964e-04, 3.990816e-03,
1.665515e-01, 5.760450e-02, 5.803227e-04, 1.738252e-02, 2.431885e-02,
1.280266e-03, 1.000000e-03, 1.000117e-03, 2.750921e-03, 3.588227e-03,
3.489142e-04, 5.117452e-04, 5.117502e-04, 3.262697e-01, 3.060975e-01,
3.089723e-02, 8.603438e-04, 8.603438e-04, 2.558906e-04, 2.558906e-04,
7.559512e-04, 1.054060e-03, 8.318323e-04, 8.602753e-04, 8.603439e-04,
8.269244e-04, 8.602833e-04, 8.979898e-04, 7.745014e-04, 5.117474e-04,
5.691315e+00, 1.780994e+00, 2.416622e-03, 2.441406e-07, 2.441406e-07,
3.065381e-05, 2.441406e-07, 2.441328e-07, 2.441324e-07, 2.884505e-07,
2.441409e-07, 2.441411e-07, 2.441399e-07, 2.441406e-07, 2.441400e-07,
2.441397e-07, 2.441406e-07, 2.441406e-07, 2.441406e-07, 2.441406e-07,
2.441406e-07, 2.441406e-07, 2.441404e-07, 2.441406e-07, 1.920616e-03)
x =c(0.3585121, 0.4399527, 0.5643820, 0.6776966, 0.7542579, 0.8374223, 0.9130900,
0.9999472, 1.0793771, 1.1249381, 1.1700218, 1.2630534, 1.4131273, 1.4795500,
1.5388979, 1.6587155, 1.7106946, 1.8248076, 1.9035620, 1.9512584, 2.0362027,
2.1065388, 2.1525816, 2.2617268, 2.6090246, 2.7180285, 2.7704006, 2.8768953,
2.9358206, 3.0000000, 3.0655239, 3.1266109, 3.1730078, 3.2681434, 3.3125953,
3.3620683, 3.4191661, 3.4851182, 3.5373484, 3.5998778, 3.6622245, 3.7306358,
3.8066598, 3.8726307, 3.9614728, 4.0515907, 4.0998298, 4.1870790, 0.4429813,
0.5619184, 0.6437753, 0.6856169, 1.1212656, 1.2513217, 1.7290070, 1.9762596,
2.0103108, 2.0440587, 2.2404542, 2.2742832, 2.5947769, 3.1292874, 3.1730608,
3.4075734, 3.4651103, 3.5266852, 3.5886457, 3.7197153, 3.7967120, 4.0553866)
我知道如何根据矢量x对矢量x进行排序,但是如何识别矢量x中的相似值然后得到它们的加权平均值?
答案 0 :(得分:2)
更新的答案
这样的事情怎么样? (见下面的代码)
我调用了原始向量origx和origw,因此重新排序的是x和w。该代码适用于x和w的临时副本(称为xtemp和wtemp),它们会被破坏,并在变量xnew和wnew中构建新的x和w(即你寻找的“较短”的矢量)。
简单来说,代码查看xtemp并找到超过阈值大小的第一个间隙(例如0.05),并将从xtemp开始运行的所有元素组合到那个“大”间隙。 (如果没有这样的差距,则需要将整个xtemp作为一个组。)然后代码将该组转换为称为wgroup的单个权重(组权重的总和)和称为xgroup的单个代表性x值(例如xgroup * wgroup与所有组元素的加权和相同)。然后我们将xgroup和wgroup保存到向量xnew和wnew中,擦除当前组(通过从xtemp和wtemp中删除它),然后以相同的方式继续,直到所有内容都被分组。
试运行,看看你的想法:)
origw = c(1.459032e-01, 1.535375e-04, 1.829973e-04, 1.057226e-01, 2.833444e-04,
2.559756e-04, 6.440060e-03, 6.294748e-02, 5.984383e-04, 2.772186e-04,
4.869825e-05, 8.212092e-04, 1.233256e-01, 2.558964e-04, 3.990816e-03,
1.665515e-01, 5.760450e-02, 5.803227e-04, 1.738252e-02, 2.431885e-02,
1.280266e-03, 1.000000e-03, 1.000117e-03, 2.750921e-03, 3.588227e-03,
3.489142e-04, 5.117452e-04, 5.117502e-04, 3.262697e-01, 3.060975e-01,
3.089723e-02, 8.603438e-04, 8.603438e-04, 2.558906e-04, 2.558906e-04,
7.559512e-04, 1.054060e-03, 8.318323e-04, 8.602753e-04, 8.603439e-04,
8.269244e-04, 8.602833e-04, 8.979898e-04, 7.745014e-04, 5.117474e-04,
5.691315e+00, 1.780994e+00, 2.416622e-03, 2.441406e-07, 2.441406e-07,
3.065381e-05, 2.441406e-07, 2.441328e-07, 2.441324e-07, 2.884505e-07,
2.441409e-07, 2.441411e-07, 2.441399e-07, 2.441406e-07, 2.441400e-07,
2.441397e-07, 2.441406e-07, 2.441406e-07, 2.441406e-07, 2.441406e-07,
2.441406e-07, 2.441406e-07, 2.441404e-07, 2.441406e-07, 1.920616e-03)
origx = c(0.3585121, 0.4399527, 0.5643820, 0.6776966, 0.7542579, 0.8374223, 0.9130900,
0.9999472, 1.0793771, 1.1249381, 1.1700218, 1.2630534, 1.4131273, 1.4795500,
1.5388979, 1.6587155, 1.7106946, 1.8248076, 1.9035620, 1.9512584, 2.0362027,
2.1065388, 2.1525816, 2.2617268, 2.6090246, 2.7180285, 2.7704006, 2.8768953,
2.9358206, 3.0000000, 3.0655239, 3.1266109, 3.1730078, 3.2681434, 3.3125953,
3.3620683, 3.4191661, 3.4851182, 3.5373484, 3.5998778, 3.6622245, 3.7306358,
3.8066598, 3.8726307, 3.9614728, 4.0515907, 4.0998298, 4.1870790, 0.4429813,
0.5619184, 0.6437753, 0.6856169, 1.1212656, 1.2513217, 1.7290070, 1.9762596,
2.0103108, 2.0440587, 2.2404542, 2.2742832, 2.5947769, 3.1292874, 3.1730608,
3.4075734, 3.4651103, 3.5266852, 3.5886457, 3.7197153, 3.7967120, 4.0553866)
reord = order(origx)
x = origx[reord]
w = origw[reord]
xnew = wnew = c()
thresh = 0.05
xtemp = x
wtemp = w
while (length(xtemp) > 0) {
nextgap = which(diff(xtemp) > thresh)[1]
if (!is.na(nextgap)) {
group = seq_len(nextgap)
} else {
group = seq_along(xtemp)
}
xgroup = sum((xtemp*wtemp)[group])/sum(wtemp[group])
wgroup = sum(wtemp[group])
xnew = c(xnew, xgroup)
wnew = c(wnew, wgroup)
xtemp = xtemp[-group]
wtemp = wtemp[-group]
}
旧回应如下(被以上所取代......)
我建议重新排序x和w,以便x按严格的数字顺序排列,然后使用diff
函数:
reord = order(x)
x2 = x[reord]
w2 = w[reord]
which(diff(x2)<0.01)
上面的最后一个命令表明x2
中的哪些元素(x
的排序版本)在下一个最高元素的0.01之内。第一个值是2,因为x2的元素2和3就是这样一个例子:x2[2]=0.4399527
和x2[3]=0.4429813
。
另外,如果你这样做
sort(diff(x2))
你可以看到按数字顺序排列的所有差异,这可能有助于你决定什么是合适的截止值。