我正在尝试在java中实现C4.5算法。为了初步了解C4.5算法,我从这个link中获取了一个python代码作为参考。在这个项目中有一个名为mine.py的文件,其中包含以下函数。
def mine_c45(table, result):
""" An entry point for C45 algorithm.
_table_ - a dict representing data table in the following format:
{
'<column name>': [<column values>],
'<column name>': [<column values>],
...
}
_result_: a string representing a name of column indicating a result.
"""
col = max([(k, gain(table, k, result)) for k in table.keys() if k != result],
key=lambda x: x[1])[0]
tree = []
for subt in get_subtables(table, col):
v = subt[col][0]
if is_mono(subt[result]):
tree.append(['%s=%s' % (col, v),
'%s=%s' % (result, subt[result][0])])
else:
del subt[col]
tree.append(['%s=%s' % (col, v)] + mine_c45(subt, result))
return tree
通过使用链接中的代码,我尝试在Java中转换此代码并对其进行一些修改。我成功获得了我想要的输出,但问题是我无法以递归方式构建树。
这里是用java。
转换的代码public void mineC45(Map<String, Attribute> table, String result) {
int maxGain = 0;
double[] gains = new double[table.size()];
int counter = 0;
SplitPoints point = null;
for (Entry<String, Attribute> entry : table.entrySet()) {
if (!entry.getKey().equals(result)) {
boolean nominal = entry.getValue().isNominal();
if (nominal)
gains[counter++] = Utils
.gain(table, entry.getKey(), result);
else {
point = Utils.numericGain(table, entry.getKey(), result);
gains[counter++] = point.getGain();
}
}
}
// calculate maximum gain column index
maxGain = Utils.getMax(gains);
List<String> keys = new ArrayList<String>(table.keySet());
String column = keys.get(maxGain);
if (table.get(column).isNominal()) {
for (Map<String, Attribute> subTable : Utils.createSubTables(table,
column)) {
String value = subTable.get(column).getValues().get(0);
if (Utils.isMono(subTable.get(result))) {
System.out.println("\t" + column + " = " + value + " "
+ result + " = "
+ subTable.get(result).getValues().get(0));
} else {
subTable.remove(column);
System.out.println(column + " = " + value + " ");
mineC45(subTable, result);
}
}
} else {
boolean first = true;
for (Map<String, Attribute> subTable : Utils.createNSubtables(
table, column, result, point.getSplitValue())) {
String sign = "";
sign = first ? "<=" : ">";
first = false;
if (Utils.isMono(subTable.get(result))) {
System.out.println("\t" + column + " "
+ point.getSplitValue().toString() + " " + result
+ " = " + subTable.get(result).getValues().get(0));
} else {
subTable.remove(column);
System.out.println(column + " "
+ point.getSplitValue().toString() + " ");
mineC45(subTable, result);
}
}
}
}
我在这里创建了代表表格的Map<String, Attribute>
。字符串键表示列名称,属性存储值列表。如果任何人可以解释我如何将输出转换为树,以便我可以形成规则。