Question

我有一组使用图像处理提取的240个特征。目标是在训练后将测试用例分为7个不同的类。对于每个类，大约有60个观测值（即，每个类有大约60个特征向量，每个向量有240个组件）。

许多研究论文和书籍都使用顺序前向搜索或顺序后向搜索来从特征向量中选择最佳特征。下图给出了顺序前向搜索算法。 Here is a snapshot of the SFS algorithm

任何此类算法都使用一些标准来区分特征。一种常见的方法是使用Bhattacharyya距离作为标准。 Bhattacharyya距离是分布之间的分歧类型度量。在一些研究和研究中，我发现给定A类的矩阵M1，该类包含该类的所有60个特征向量，使得它具有n = 60行和m = 240列（因为总共有240个特征）并且类BI的类似矩阵M2可以找出它们之间的Bhattacharyya距离并找到它们的相互依赖性。

我的问题是如何整合这两者。如何将Bhattacharyya距离作为选择算法中最佳特征的标准，如上所述。

Answer 1

在Arthur B.的帮助下，我终于理解了这个概念。这是我的实现。虽然我使用了Plus l Take away r算法（Sequential Forwards Forward Search Search）Ill post，因为一旦删除了Backward Search，它就基本相同了。以下实现是在matlab中，但很容易理解：

S=zeros(Size,1); %Initial the binary array feature list with all zeros implying no feature selected
k=0;
while k<n  %Begin SFS. n is the number of features that need to be extracted
t=k+l;     %l is the number of features to be added in each iteration
while k<t
    R=zeros(Size,1);  %Size is the total number of features
    for i=1:Size
        if S(i)==0    %If the feature has not been selected. S is a binary array which puts a one against each feature that is selected
            S_copy=S;
            S_copy(i)=1;
            R=OperateBhattacharrya(Matrices,S_copy,i,e,R);  %The result of each iteration is stored in R
        end
    end
    k=k+1;   %increment k
    [~,N]=max(R);  %take the index of the maximum element in R as the best feature to be selected
    S(N)=1;        % put the index of selected feature as 1
end
t=k-r;    %r is the number of features to be removed after selecting l features. l>r
while k>t  %start Sequential Backward Search 
    R=zeros(Size,1);
    for i=1:Size
        if S(i)==1
            S_copy=S;
            S_copy(i)=0;
            R=OperateBhattacharrya(Matrices,S_copy,i,1,R);
        end
    end
    k=k-1;
    [~,N]=max(R);
    S(N)=0;
end
fprintf('Iteration :%d--%d\n',k,t);
end

我希望这可以帮助任何有类似问题的人。

Answer 2

这是算法的“评估分支”部分，除了你首先在一维向量，二维向量等上使用这个Bhattacharyya距离。

使用Bhattacharyya距离进行特征选择

2 个答案: