修改ID3算法C#使用4个类而不是原始2(错误c#对象引用未设置为instanc)

时间:2015-01-23 01:48:26

标签: c# algorithm machine-learning id3

我试图使用修改使用2个类(true,false)的原始代码来使用4个类(unacc,acc,good,vgood)。原始代码假设" boolean",因此要解决它我需要比较字符串到字符串。

由于某种原因,它返回以下错误对象引用未设置为对象的实例。我已经谷歌了,当我没有启动某些事情时就会发生这种情况。

错误:

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at AA.TreeNode..ctor(Attribute attribute) in c:\VS\Program.cs:line 117
   at AA.DecisionTreeID3.internalMountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 574
   at AA.DecisionTreeID3.mountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 626
   at AA.DecisionTreeID3.internalMountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 607
   at AA.DecisionTreeID3.mountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 626
   at AA.DecisionTreeID3.internalMountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 607
   at AA.DecisionTreeID3.mountTree(DataTable samples, String targetAttribute, Attribute[] attributes) in c:\VS\Program.cs:line 626
   at AA.ID3Sample.Main(String[] args) in c:\VS\Program.cs:line 718

原始代码/解决方案可用here

我将代码留在下面。 比较两个字符串可能是我的错误? 可以创建一个函数作为字符串,最后用字符串返回值吗?如果没有,我可以将功能更改为" int"并将其作为值0,1,2和3返回。

有人可以帮助我或分享我如何解决这个问题的想法吗?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
using System.Data;

namespace AA
{
    /// <summary>
    /// Classe que representa um atributo utilizado na classe de decisão
    /// </summary>
    public class Attribute
    {
        ArrayList mValues;
        string mName;
        object mLabel;

        /// <summary>
        /// Inicializa uma nova instância de uma classe Atribute
        /// </summary>
        /// <param name="name">Indica o nome do atributo</param>
        /// <param name="values">Indica os valores possíveis para o atributo</param>
        public Attribute(string name, string[] values)
        {
            mName = name;
            mValues = new ArrayList(values);
            mValues.Sort();
        }

        public Attribute(object Label)
        {
            mLabel = Label;
            mName = string.Empty;
            mValues = null;
        }

        /// <summary>
        /// Indica o nome do atributo
        /// </summary>
        public string AttributeName
        {
            get
            {
                return mName;
            }
        }

        /// <summary>
        /// Retorna um array com os valores do atributo
        /// </summary>
        public string[] values
        {
            get
            {
                if (mValues != null)
                    return (string[])mValues.ToArray(typeof(string));
                else
                    return null;
            }
        }

        /// <summary>
        /// Indica se um valor é permitido para este atributo
        /// </summary>
        /// <param name="value"></param>
        /// <returns></returns>
        public bool isValidValue(string value)
        {
            return indexValue(value) >= 0;
        }

        /// <summary>
        /// Retorna o índice de um valor
        /// </summary>
        /// <param name="value">Valor a ser retornado</param>
        /// <returns>O valor do índice na qual a posição do valor se encontra</returns>
        public int indexValue(string value)
        {
            if (mValues != null)
                return mValues.BinarySearch(value);
            else
                return -1;
        }

        /// <summary>
        /// 
        /// </summary>
        /// <returns></returns>
        public override string ToString()
        {
            if (mName != string.Empty)
            {
                return mName;
            }
            else
            {
                return mLabel.ToString();
            }
        }
    }

    /// <summary>
    /// Classe que representará a arvore de decisão montada;
    /// </summary>
    public class TreeNode
    {
        private ArrayList mChilds = null;
        private Attribute mAttribute;

        /// <summary>
        /// Inicializa uma nova instância de TreeNode
        /// </summary>
        /// <param name="attribute">Atributo ao qual o node está ligado</param>
        public TreeNode(Attribute attribute)
        {
            if (attribute.values != null)
            {
                mChilds = new ArrayList(attribute.values.Length);
                for (int i = 0; i < attribute.values.Length; i++)
                    mChilds.Add(null);
            }
            else
            {
                mChilds = new ArrayList(1);
                mChilds.Add(null);
            }
            mAttribute = attribute;
        }

        /// <summary>
        /// Adiciona um TreeNode filho a este treenode no galho de nome indicicado pelo ValueName
        /// </summary>
        /// <param name="treeNode">TreeNode filho a ser adicionado</param>
        /// <param name="ValueName">Nome do galho onde o treeNode é criado</param>
        public void AddTreeNode(TreeNode treeNode, string ValueName)
        {
            int index = mAttribute.indexValue(ValueName);
            mChilds[index] = treeNode;
        }

        /// <summary>
        /// Retorna o nro total de filhos do nó
        /// </summary>
        public int totalChilds
        {
            get
            {
                return mChilds.Count;
            }
        }

        /// <summary>
        /// Retorna o nó filho de um nó
        /// </summary>
        /// <param name="index">Indice do nó filho</param>
        /// <returns>Um objeto da classe TreeNode representando o nó</returns>
        public TreeNode getChild(int index)
        {
            return (TreeNode)mChilds[index];
        }

        /// <summary>
        /// Atributo que está conectado ao Nó
        /// </summary>
        public Attribute attribute
        {
            get
            {
                return mAttribute;
            }
        }

        /// <summary>
        /// Retorna o filho de um nó pelo nome do galho que leva até ele
        /// </summary>
        /// <param name="branchName">Nome do galho</param>
        /// <returns>O nó</returns>
        public TreeNode getChildByBranchName(string branchName)
        {
            int index = mAttribute.indexValue(branchName);
            return (TreeNode)mChilds[index];
        }
    }

    /// <summary>
    /// Classe que implementa uma árvore de Decisão usando o algoritmo ID3
    /// </summary>
    public class DecisionTreeID3
    {
        private DataTable mSamples;
        private int mTotalUnacc = 0;
        private int mTotalAcc = 0;
        private int mTotalGood = 0;
        private int mTotalVgood = 0;
        private int mTotal = 0;
        private string mTargetAttribute = "result";
        private double mEntropySet = 0.0;

        /// <summary>
        /// Retorna o total de amostras positivas em uma tabela de amostras
        /// </summary>
        /// <param name="samples">DataTable com as amostras</param>
        /// <returns>O nro total de amostras positivas</returns>
        /// 

        private int countTotalUnacc(DataTable samples)
        {
            int result = 0;

            foreach (DataRow aRow in samples.Rows)
            {
                if (aRow[mTargetAttribute].ToString().ToUpper().Trim() == "UNACC")
                    result++;
            }

            return result;
        }

        private int countTotalAcc(DataTable samples)
        {
            int result = 0;

            foreach (DataRow aRow in samples.Rows)
            {
                if (aRow[mTargetAttribute].ToString().ToUpper().Trim() == "ACC")
                    result++;
            }

            return result;
        }

        private int countTotalGood(DataTable samples)
        {
            int result = 0;

            foreach (DataRow aRow in samples.Rows)
            {
                if (aRow[mTargetAttribute].ToString().ToUpper().Trim() == "GOOD")
                    result++;
            }

            return result;
        }

        private int countTotalVgood(DataTable samples)
        {
            int result = 0;

            foreach (DataRow aRow in samples.Rows)
            {
                if (aRow[mTargetAttribute].ToString().ToUpper().Trim() == "VGOOD")
                    result++;
            }

            return result;
        }

        /// <summary>
        /// Calcula a entropia dada a seguinte fórmula
        /// -p+log2p+ - p-log2p-
        /// 
        /// onde: p+ é a proporção de valores positivos
        ///       p- é a proporção de valores negativos
        /// </summary>
        /// <param name="positives">Quantidade de valores positivos</param>
        /// <param name="negatives">Quantidade de valores negativos</param>
        /// <returns>Retorna o valor da Entropia</returns>
        private double calcEntropy(int unacc, int acc,int good,int vgood)
        {
            //int total = positives + negatives;
            int total = unacc + acc + good + vgood;
            double ratioUnacc = (double)unacc / total;
            double ratioAcc = (double)acc / total;
            double ratioGood = (double)good / total;
            double ratioVgood = (double)vgood / total;

            if (ratioUnacc != 0)
                ratioUnacc = -(ratioUnacc) * System.Math.Log(ratioUnacc, 2);

            if (ratioAcc != 0)
                ratioAcc = -(ratioAcc) * System.Math.Log(ratioAcc, 2);

            if (ratioGood != 0)
                ratioGood = -(ratioGood) * System.Math.Log(ratioGood, 2);

            if (ratioVgood != 0)
                ratioVgood = -(ratioVgood) * System.Math.Log(ratioVgood, 2);

            double result = ratioUnacc + ratioAcc + ratioGood + ratioVgood;

            return result;
        }

        /// <summary>
        /// Varre tabela de amostras verificando um atributo e se o resultado é positivo ou negativo
        /// </summary>
        /// <param name="samples">DataTable com as amostras</param>
        /// <param name="attribute">Atributo a ser pesquisado</param>
        /// <param name="value">valor permitido para o atributo</param>
        /// <param name="positives">Conterá o nro de todos os atributos com o valor determinado com resultado positivo</param>
        /// <param name="negatives">Conterá o nro de todos os atributos com o valor determinado com resultado negativo</param>
         private void getValuesToAttribute(DataTable samples, Attribute attribute, string value, out int unacc, out int acc, out int good, out int vgood)    
    {

            unacc = 0;
            acc = 0;
            good = 0;
            vgood = 0;

            foreach (DataRow aRow in samples.Rows)
            {
                if (
                    ((string)aRow[attribute.AttributeName] == value))
                    if ((string)aRow[mTargetAttribute] == "unacc")
                        unacc++;
                    else if ((string)aRow[mTargetAttribute] == "acc")
                        acc++;
                    else if ((string)aRow[mTargetAttribute] == "good")
                    good++;
                    else
                        vgood++;

            }
        }

        /// <summary>
        /// Calcula o ganho de um atributo
        /// </summary>
        /// <param name="attribute">Atributo a ser calculado</param>
        /// <returns>O ganho do atributo</returns>
        private double gain(DataTable samples, Attribute attribute)
        {
            string[] values = attribute.values;
            double sum = 0.0;

            for (int i = 0; i < values.Length; i++)
            {
                int unacc, acc, good, vgood;
                unacc = acc = good = vgood = 0;
                getValuesToAttribute(samples, attribute, values[i], out unacc, out acc, out good, out vgood);
                double entropy = calcEntropy(unacc, acc,good,vgood);
                sum += -(double)(unacc + acc + good + vgood) / mTotal * entropy;
            }
            return mEntropySet + sum;
        }

        /// <summary>
        /// Retorna o melhor atributo.
        /// </summary>
        /// <param name="attributes">Um vetor com os atributos</param>
        /// <returns>Retorna o que tiver maior ganho</returns>
        private Attribute getBestAttribute(DataTable samples, Attribute[] attributes)
        {
            double maxGain = 0.0;
            Attribute result = null;

            foreach (Attribute attribute in attributes)
            {
                double aux = gain(samples, attribute);
                if (aux > maxGain)
                {
                    maxGain = aux;
                    result = attribute;
                }
            }
            return result;
        }

        /// <summary>
        /// Retorna true caso todos os exemplos da amostragem são positivos
        /// </summary>
        /// <param name="samples">DataTable com as amostras</param>
        /// <param name="targetAttribute">Atributo (coluna) da tabela a qual será verificado</param>
        /// <returns>True caso todos os exemplos da amostragem são positivos</returns>
        private string allSamplesUnacc(DataTable samples, string targetAttribute)
        {
            foreach (DataRow row in samples.Rows)
            { //alterar
                if (row[targetAttribute].ToString() == "acc")
                    return "acc";
                if (row[targetAttribute].ToString() == "good")
                    return "good";
                if (row[targetAttribute].ToString() == "vgood")
                    return "vgood";
            }

            return "unacc";
        }

        private string allSamplesAcc(DataTable samples, string targetAttribute)
        {
            foreach (DataRow row in samples.Rows)
            { //alterar
                if (row[targetAttribute].ToString() == "unacc")
                    return "unacc";
                if (row[targetAttribute].ToString() == "good")
                    return "good";
                if (row[targetAttribute].ToString() == "vgood")
                    return "vgood";
            }

            return "acc";
        }
        private string allSamplesGood(DataTable samples, string targetAttribute)
        {
            foreach (DataRow row in samples.Rows)
            { //alterar
                if (row[targetAttribute].ToString() == "unacc")
                    return "unacc";
                if (row[targetAttribute].ToString() == "acc")
                    return "acc";
                if (row[targetAttribute].ToString() == "vgood")
                    return "vgood";
            }

            return "good";
        }
        private string allSamplesVgood(DataTable samples, string targetAttribute)
        {
            foreach (DataRow row in samples.Rows)
            { //alterar
                if (row[targetAttribute].ToString() == "unacc")
                    return "unacc";
                if (row[targetAttribute].ToString() == "acc")
                    return "acc";
                if (row[targetAttribute].ToString() == "good")
                    return "good";
            }

            return "vgood";
        }

        /// <summary>
        /// Retorna uma lista com todos os valores distintos de uma tabela de amostragem
        /// </summary>
        /// <param name="samples">DataTable com as amostras</param>
        /// <param name="targetAttribute">Atributo (coluna) da tabela a qual será verificado</param>
        /// <returns>Um ArrayList com os valores distintos</returns>
        private ArrayList getDistinctValues(DataTable samples, string targetAttribute)
        {
            ArrayList distinctValues = new ArrayList(samples.Rows.Count);

            foreach (DataRow row in samples.Rows)
            {
                if (distinctValues.IndexOf(row[targetAttribute]) == -1)
                    distinctValues.Add(row[targetAttribute]);
            }

            return distinctValues;
        }

        /// <summary>
        /// Retorna o valor mais comum dentro de uma amostragem
        /// </summary>
        /// <param name="samples">DataTable com as amostras</param>
        /// <param name="targetAttribute">Atributo (coluna) da tabela a qual será verificado</param>
        /// <returns>Retorna o objeto com maior incidência dentro da tabela de amostras</returns>
        private object getMostCommonValue(DataTable samples, string targetAttribute)
        {
            ArrayList distinctValues = getDistinctValues(samples, targetAttribute);
            int[] count = new int[distinctValues.Count];

            foreach (DataRow row in samples.Rows)
            {
                int index = distinctValues.IndexOf(row[targetAttribute]);
                count[index]++;
            }

            int MaxIndex = 0;
            int MaxCount = 0;

            for (int i = 0; i < count.Length; i++)
            {
                if (count[i] > MaxCount)
                {
                    MaxCount = count[i];
                    MaxIndex = i;
                }
            }

            return distinctValues[MaxIndex];
        }

        /// <summary>
        /// Monta uma árvore de decisão baseado nas amostragens apresentadas
        /// </summary>
        /// <param name="samples">Tabela com as amostragens que serão apresentadas para a montagem da árvore</param>
        /// <param name="targetAttribute">Nome da coluna da tabela que possue o valor true ou false para 
        /// validar ou não uma amostragem</param>
        /// <returns>A raiz da árvore de decisão montada</returns></returns?>
        private TreeNode internalMountTree(DataTable samples, string targetAttribute, Attribute[] attributes)
        {
            //alterar
            if (allSamplesUnacc(samples, targetAttribute) == "unacc")
                return new TreeNode(new Attribute("unacc"));

            if (allSamplesAcc(samples, targetAttribute) == "acc")
                return new TreeNode(new Attribute("acc"));

            if (allSamplesGood(samples, targetAttribute) == "good")
                return new TreeNode(new Attribute("good"));

            if (allSamplesVgood(samples, targetAttribute) == "vgood")
                return new TreeNode(new Attribute("vgood"));

            if (attributes.Length == 0)
                return new TreeNode(new Attribute(getMostCommonValue(samples, targetAttribute)));

            mTotal = samples.Rows.Count;
            mTargetAttribute = targetAttribute;
            mTotalUnacc = countTotalUnacc(samples);
            mTotalAcc = countTotalAcc(samples);
            mTotalGood = countTotalGood(samples);
            mTotalVgood = countTotalVgood(samples);

            mEntropySet = calcEntropy(mTotalUnacc, mTotalAcc, mTotalGood, mTotalVgood);

            Attribute bestAttribute = getBestAttribute(samples, attributes);

            TreeNode root = new TreeNode(bestAttribute);

            DataTable aSample = samples.Clone();

            foreach (string value in bestAttribute.values)
            {
                // Seleciona todas os elementos com o valor deste atributo              
                aSample.Rows.Clear();

                DataRow[] rows = samples.Select(bestAttribute.AttributeName + " = " + "'" + value + "'");

                foreach (DataRow row in rows)
                {
                    aSample.Rows.Add(row.ItemArray);
                }
                // Seleciona todas os elementos com o valor deste atributo              

                // Cria uma nova lista de atributos menos o atributo corrente que é o melhor atributo               
                ArrayList aAttributes = new ArrayList(attributes.Length - 1);
                for (int i = 0; i < attributes.Length; i++)
                {
                    if (attributes[i].AttributeName != bestAttribute.AttributeName)
                        aAttributes.Add(attributes[i]);
                }
                // Cria uma nova lista de atributos menos o atributo corrente que é o melhor atributo

                if (aSample.Rows.Count == 0)
                {
                    return new TreeNode(new Attribute(getMostCommonValue(aSample, targetAttribute)));
                }
                else
                {
                    DecisionTreeID3 dc3 = new DecisionTreeID3();
                    TreeNode ChildNode = dc3.mountTree(aSample, targetAttribute, (Attribute[])aAttributes.ToArray(typeof(Attribute)));
                    root.AddTreeNode(ChildNode, value);
                }
            }

            return root;
        }


        /// <summary>
        /// Monta uma árvore de decisão baseado nas amostragens apresentadas
        /// </summary>
        /// <param name="samples">Tabela com as amostragens que serão apresentadas para a montagem da árvore</param>
        /// <param name="targetAttribute">Nome da coluna da tabela que possue o valor true ou false para 
        /// validar ou não uma amostragem</param>
        /// <returns>A raiz da árvore de decisão montada</returns></returns?>
        public TreeNode mountTree(DataTable samples, string targetAttribute, Attribute[] attributes)
        {
            mSamples = samples;
            return internalMountTree(mSamples, targetAttribute, attributes);
        }
    }

    /// <summary>
    /// Classe que exemplifica a utilização do ID3
    /// </summary>
    class ID3Sample
    {

        public static void printNode(TreeNode root, string tabs)
        {
            Console.WriteLine(tabs + '|' + root.attribute + '|');

            if (root.attribute.values != null)
            {
                for (int i = 0; i < root.attribute.values.Length; i++)
                {
                    Console.WriteLine(tabs + "\t" + "<" + root.attribute.values[i] + ">");
                    TreeNode childNode = root.getChildByBranchName(root.attribute.values[i]);
                    printNode(childNode, "\t" + tabs);
                }
            }
        }


        static DataTable getDataTable()
        {
            DataTable result = new DataTable("samples");
            DataColumn column = result.Columns.Add("buying");
            column.DataType = typeof(string);

            column = result.Columns.Add("maint");
            column.DataType = typeof(string);

            column = result.Columns.Add("doors");
            column.DataType = typeof(string);

            column = result.Columns.Add("persons");
            column.DataType = typeof(string);

            column = result.Columns.Add("lugboot");
            column.DataType = typeof(string);

            column = result.Columns.Add("safety");
            column.DataType = typeof(string);

            column = result.Columns.Add("result");
            column.DataType = typeof(string);

            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "small", "high", "acc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "med", "low", "unacc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "med", "med", "unacc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "med", "high", "acc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "big", "low", "unacc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "big", "med", "acc" });
            result.Rows.Add(new object[] { "vhigh", "med", "2", "4", "big", "high", "acc" });
            result.Rows.Add(new object[] { "med", "low", "2", "4", "big", "high", "vgood" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "small", "low", "unacc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "small", "med", "unacc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "small", "high", "unacc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "med", "low", "unacc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "med", "med", "acc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "med", "high", "good" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "big", "low", "unacc" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "big", "med", "good" });
            result.Rows.Add(new object[] { "med", "low", "2", "more", "big", "high", "vgood" });

            return result;

        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        /// 
        [STAThread]
        static void Main(string[] args)
        {

            Attribute buying = new Attribute("buying", new string[] { "vhigh", "high", "med", "low" });
            Attribute maint = new Attribute("maint", new string[] { "vhigh", "high", "med", "low" });
            Attribute doors = new Attribute("doors", new string[] { "2", "3", "4", "5more" });
            Attribute persons = new Attribute("persons", new string[] { "2", "4", "more" });
            Attribute lugboot = new Attribute("lugboot", new string[] { "small", "med", "big" });
            Attribute safety = new Attribute("safety", new string[] { "low", "med", "high" });

            Attribute[] attributes = new Attribute[] { buying, maint, doors, persons, lugboot, safety };

            DataTable samples = getDataTable();

            DecisionTreeID3 id3 = new DecisionTreeID3();
            TreeNode root = id3.mountTree(samples, "result", attributes);

            printNode(root, "");

        }
    }

}

1 个答案:

答案 0 :(得分:0)

您的堆栈跟踪包含您需要的信息:

AA.TreeNode..ctor(Attribute attribute) in c:\VS\Program.cs:line 117

'ctor'是构造函数的花哨编译器俚语,因此我们将看一下TreeNode(Attribute)。

这可能意味着attribute(构造函数的参数)为null。那是怎么发生的?由于我们已经将所有代码放在了我们面前,让我们专注于一个没有显式创建新的Attribute对象以传递给TreeNode构造函数的调用:

Attribute bestAttribute = getBestAttribute(samples, attributes);
TreeNode root = new TreeNode(bestAttribute);
如果没有任何属性的增益大于零,或者没有属性,

getBestAttribute将返回null。要修复异常,您需要修复该方法的输入或逻辑。