我一直在读Tom Mitchell关于机器学习的书,这是遗传算法分类的一部分。他们提出的例子相当简单,他们说如果我有以下内容:
然后适应度函数可以定义为:
我想将此方法应用于具有以下形式的人口普查收入数据的分类:
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
在此数据集中,属性如下:
age: continuous.
workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num: continuous.
marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.
relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
sex: Female, Male.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
native-country
最后我想要的是有一个分类器给出一些属性可以预测这个人的收入是否小于或大于50000.我怎样才能为这种情况建模适应度函数?
答案 0 :(得分:0)
通常,为此目的,使用遗传编程。这是一篇描述这种情况的论文:http://web.cs.mun.ca/~banzhaf/papers/ieee_taec.pdf
如果您正在寻找源代码,可以使用ricardo poli的Tiny GP:http://cswww.essex.ac.uk/staff/rpoli/TinyGP/但是,首先必须将所有属性转换为数值。
您还可以使用其他GP变体。我做了一个多表达式编程的实现,它在这里:http://www.mepx.org/source_code.html