老实说,我很沮丧。我正在尝试从头开始编写决策树算法。我知道如何获取树的根,但是我不知道如何使此代码递归以获得完整的决策树。所以基本上说我有这样的数据
outlook, temperature, humidity, wind, playTennis
sunny, hot, high, weak, no
sunny, hot, high, strong, no
overcast, hot, high, weak, yes
rain, mild, high, weak, yes
rain,cool, normal, weak, yes
rain, cool, normal, strong, no
overcast, cool, normal, strong, yes
sunny, mild, high, weak, no
sunny, cool, normal, weak, yes
rain, mild, normal, weak, yes
sunny, mild, normal, strong, yes
overcast, mild, high, strong, yes
overcast, hot, normal, weak, yes
rain, mild, high, strong, no
我想做的是创建一个名为Feature的类(例如:Outlook),并且每个Feature都有一个FeatureValue列表(例如:晴天,阴天,下雨)。在我的Feature类中,我可以计算特征的信息增益,而在FeatureValue类中,我可以计算其熵
功能分类:
public class Feature implements Comparable<Feature>{
public String name;
public LinkedList<FeatureValue> featureValueList=new LinkedList<FeatureValue>();
public double entropy;
public double informationGain;
public Feature() {
}
public Feature(String name) {
this.name = name;
}
public void calculateInformationGain(double targetEntropy, int numberOfExamples) {
double sum=0;
for(int i=0;i<featureValueList.size();i++) {
double p=((double)featureValueList.get(i).total/numberOfExamples)*featureValueList.get(i).entropy;
sum+=p;
}
informationGain=targetEntropy-sum;
}
@Override
public int compareTo(Feature o) {
if(this.informationGain>o.informationGain)
return 1;
else if(this.informationGain<o.informationGain)
return -1;
return 0;
}
}
FeatureValue类
public class FeatureValue {
public String name;
public int targetClassCounter=0;
public int notTargetClassCounter=0;
public double entropy;
public int total;
public boolean isClassified=false;
public String classified;
public FeatureValue() {
}
public FeatureValue(String name,int targetClassCounter) {
this.name = name;
this.targetClassCounter=targetClassCounter;
}
public FeatureValue(String name) {
this.name = name;
}
public void calculateEntropy() {
total=targetClassCounter+notTargetClassCounter;
double ppe=(double)targetClassCounter/total;
double npe=(double)notTargetClassCounter/total;
double entropy=(-1)*ppe*(Math.log(ppe)/Math.log(2))+(-1)*(npe)*(Math.log(npe)/Math.log(2));
if(Double.isNaN(entropy))
entropy=0;
this.entropy=entropy;
}
}
因此,我制作了一个名为DecisionTree的类。它具有称为featureList的功能列表(例如:外观,温度,风力)。我具有此函数,该函数可以计算熵和信息增益,以获得首次拆分的最佳特征。上面的数据被填充到称为数据集的2D字符串数组中(它只有数据,没有第一行)。该函数返回具有最佳功能的节点,该节点是决策树的根
public LinkedList<Feature> featureList = new LinkedList<Feature>();
public GenericTreeNode<String> getBestFeature() {
for (int row = 0; row < dataset.length; row++) {
if (dataset[row][dataset[row].length - 1].trim().equals(targetClass))
numberOfTargetInstances++;
else
numberOfNotTargetInstances++;
for (int col = 0; col < dataset[row].length - 1; col++) {
if (dataset[row][col].trim().equals("?"))
continue;
if (featureList.get(col).isDeleted) {
continue;
}
// System.out.println(col);
int current = getIndexOf(featureList.get(col).featureValueList, dataset[row][col].trim());
if (dataset[row][dataset[row].length - 1].trim().equals(targetClass))
featureList.get(col).featureValueList.get(current).targetClassCounter++;
else
featureList.get(col).featureValueList.get(current).notTargetClassCounter++;
}
}
for (int i = 0; i < featureList.size(); i++)
for (int j = 0; j < featureList.get(i).featureValueList.size(); j++)
featureList.get(i).featureValueList.get(j).calculateEntropy();
calculateEntropyOfTargetClass();
for (int i = 0; i < featureList.size(); i++) {
featureList.get(i).calculateInformationGain(targetEntropy, numberOfFeatureValues);
}
Feature max = Collections.max(featureList);
featureList.get(max.index).isDeleted = true;
GenericTreeNode<String> best=new GenericTreeNode<String>(max.name);
return best;
}
现在这是我遇到的问题,如何才能使此函数递归以实际构建树?