给定的是具有双值的向量。我想知道这个矢量的任何元素之间的距离彼此相似。在最好的情况下,结果是原始值的子集向量,其中子集应至少具有 n 成员。
//given
vector<double> values = {1,2,3,4,8,10,12}; //with simple values as example
//some algorithm
//desired result as:
vector<vector<double> > subset;
//in case of above example I would expect some result like:
//subset[0] = {1,2,3,4}; //distance 1
//subset[1] = {8,10,12}; //distance 2
//subset[2] = {4,8,12}; // distance 4
//subset[3] = {2,4}; //also distance 2 but not connected with subset[1]
//subset[4] = {1,3}; //also distance 2 but not connected with subset[1] or subset[3]
//many others if n is just 2. If n is 3 (normally the minimum) these small subsets should be excluded.
这个例子很简单,因为整数的距离可以迭代并测试矢量,而不是double或float的情况。
到目前为止我的想法
我想到了计算距离并将它们存储在矢量中的方法。创建差异距离矩阵并对此矩阵进行阈值处理以获得相似距离的容差。
//Calculate distances: result is a vector
vector<double> distances;
for (int i = 0; i < values.size(); i++)
for (int j = 0; j < values.size(); j++)
{
if (i >= j)
continue;
distances.push_back(abs(values[i] - values[j]));
}
//Calculate difference of these distances: result is a matrix
Mat DiffDistances = Mat::zero(Size(distances.size(), distances.size()), CV_32FC1);
for (int i = 0; i < distances.size(); i++)
for (int j = 0; j < distances.size(); j++)
{
if (i >= j)
continue;
DiffDistances.at<float>(i,j) = abs(distances[i], distances[j]);
}
//threshold this matrix with some tolerance in difference distances
threshold(DiffDistances, DiffDistances, maxDistTol, 255, CV_THRESH_BINARY_INV);
//get points with similar distances
vector<Points> DiffDistancePoints;
findNonZero(DiffDistances, DiffDistancePoints);
此时我发现原始值与我的相似距离相对应。应该可以找到它们,但追溯指数似乎非常复杂,我想知道是否有更简单的方法来解决问题。
答案 0 :(得分:1)
这是一个与你的略有不同的算法,它在向量的长度O(n^3)
中为n
- 效率不高。
它的前提是你想要至少有2个子集。那么你可以做的是考虑向量的所有双元素子集,然后找到所有其他也匹配的元素。
所以给定一个函数
std::vector<int> findSubset(std::vector<int> v, int baseValue, int distance) {
// Find the subset of all elements in v that differ by a multiple of
// distance from the base value
}
你可以做到
std::vector<std::vector<int>> findSubsets(std::vector<int> v) {
for(int i = 0; i < v.size(); i++) {
for(int j = i + 1; j < v.size(); j++) {
subsets.push_back(findSubset(v, v[i], abs(v[i] - v[j])));
}
}
return subsets;
}
只有剩下的问题是跟踪重复项,也许您可以为已经找到的所有子集保留(baseValue % distance
,distance
)对的散列列表。
答案 1 :(得分:1)
这是一个有效的解决方案,只要没有分支意义,就没有比2*threshold
更接近的值。这是有效的邻居区域,因为如果我正确理解@Phann,相邻债券的差异应该小于阈值。
解决方案绝对不是最快或最好的解决方案。但你可以用它作为起点:
#include <iostream>
#include <vector>
#include <algorithm>
int main(){
std::vector< double > values = {1,2,3,4,8,10,12};
const unsigned int nValues = values.size();
std::vector< std::vector< double > > distanceMatrix(nValues - 1);
// The distanceMatrix has a triangular shape
// First vector contains all distances to value zero
// Second row all distances to value one for larger values
// nth row all distances to value n-1 except those already covered
std::vector< std::vector< double > > similarDistanceSubsets;
double threshold = 0.05;
std::sort(values.begin(), values.end());
for (unsigned int i = 0; i < nValues-1; ++i) {
distanceMatrix.at(i).resize(nValues-i-1);
for (unsigned j = i+1; j < nValues; ++j){
distanceMatrix.at(i).at(j-i-1) = values.at(j) - values.at(i);
}
}
for (unsigned int i = 0; i < nValues-1; ++i) {
for (unsigned int j = i+1; j < nValues; ++j) {
std::vector< double > thisSubset;
double thisDist = distanceMatrix.at(i).at(j-i-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
double minDist = thisDist - threshold;
double maxDist = thisDist + threshold;
thisSubset.push_back(values.at(i));
thisSubset.push_back(values.at(j));
//Indicate that this is already clustered
distanceMatrix.at(i).at(j-i-1) = -1;
unsigned int lastIndex = j;
for (unsigned int k = j+1; k < nValues; ++k) {
thisDist = distanceMatrix.at(lastIndex).at(k-lastIndex-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
// Check if you found a new valid pair
if ((thisDist > minDist) && (thisDist < maxDist)){
// Update the valid distance interval
minDist = thisDist - threshold;
minDist = thisDist - threshold;
// Add the newly found point
thisSubset.push_back(values.at(k));
// Indicate that this is already clustered
distanceMatrix.at(lastIndex).at(k-lastIndex-1) = -1;
// Continue the search from here
lastIndex = k;
}
}
if (thisSubset.size() > 2) {
similarDistanceSubsets.push_back(thisSubset);
}
}
}
for (unsigned int i = 0; i < similarDistanceSubsets.size(); ++i) {
for (unsigned int j = 0; j < similarDistanceSubsets.at(i).size(); ++j) {
std::cout << similarDistanceSubsets.at(i).at(j);
if (j != similarDistanceSubsets.at(i).size()-1) {
std::cout << " ";
}
else {
std::cout << std::endl;
}
}
}
}
这个想法是预先计算距离,然后寻找每对粒子,从最小的和更大的邻居开始,如果它上面有另一个有效的对。如果是这样,则这些都在子集中收集,并且这被添加到子集向量中。对于每个新值,必须更新有效邻居区域以确保相邻距离的差异小于阈值。之后,程序继续使用下一个最小值及其较大的邻居,依此类推。