如何将数值分类数据转换为张量流中的稀疏张量?

时间:2016-12-16 05:53:00

标签: python machine-learning tensorflow logistic-regression

我的数据集格式如下所示:

$time1="24:00:00";
    $time2="00:45:00";
    class times_counter {
        private $hou = 0;
        private $min = 0;
        private $sec = 0;
        private $totaltime = '00:00:00';
        public function __construct($times){
             if(is_array($times)){

                $length = sizeof($times);

                for($x=0; $x <= $length; $x++){
                        $split = explode(":", @$times[$x]); 
                        $this->hou += @$split[0];
                        $this->min += @$split[1];
                        $this->sec += @$split[2];
                }

                $seconds = $this->sec % 60;
                $minutes = $this->sec / 60;
                $minutes = (integer)$minutes;
                $minutes += $this->min;
                $hours = $minutes / 60;
                $minutes = $minutes % 60;
                $hours = (integer)$hours;
                $hours += $this->hou ;
                $this->totaltime = $hours.":".$minutes.":".$seconds;
            }
        }

        public function get_total_time(){
            return $this->totaltime;
        }

    }

    $times = array(
        '00:32:00',
        '25:15:00',
        '25:40:20',
        '02:05:16'
    );

    $counter = new times_counter($times);
    echo $counter->get_total_t//outputs:
    //10:30:36ime();`

它由所有分类数据组成,其中每个要素都以数字方式编码。我尝试使用以下代码:

8,2,1,1,1,0,3,2,6,2,2,2,2
8,2,1,2,0,0,15,2,1,2,2,2,1
5,5,4,4,0,0,6,1,6,2,2,1,2
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,4,0,1,3,2,1,2,2,2,1
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,12,0,0,5,2,2,2,2,2,1
3,1,1,2,0,0,3,2,1,2,2,2,1

但是我收到以下错误:

        monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6'])
        #Other columns are also declared in the same way

        m = tf.contrib.learn.LinearClassifier(feature_columns=[
        caste, religion, differently_abled, nature_of_activity, school, dropout, qualification,
        computer_literate, monthly_income, smoke,drink,tobacco,sex],
        model_dir=model_dir)

1 个答案:

答案 0 :(得分:5)

我认为问题超出了您展示的代码范围。我的猜测是csv文件中的功能被读作int,但你希望它们是字符串,通过传递keys=['1', '2', ...]

尽管如此,在这种情况下,我建议您使用sparse_column_with_integerized_feature

monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7)