我是weka的新手,我目前正在创建数据集上运行一些分类算法。
数据集包含一个类{player1,player2,player3},其样本按玩家的顺序排序。
例如:
2,748.564,384.103,1.389,0.395,2354.950,0,1858.400,0.353,5,PLAYER_1 1,729.143,391.086,1.479,0.378,2677.350,0,1496.900,0.333,3,PLAYER_1 2,719.765,391.824,1.295,0.469,2659.625,0,1889.429,0.250,2,PLAYER_1 1,726.515,388.121,1.506,0.360,2236.200,0,1431.800,0.364,4,Player_2 2,733.667,387.000,1.241,0.405,2612.450,0,2322.400,0.444,5,Player_2 1,744.343,380.000,1.516,0.366,2461.500,0,1455.050,0.417,3,Player_2 2,729.500,387.167,1.336,0.422,2150.167,0,2092.000,0.429,1,Player_3 1,734.100,398.700,1.522,0.311,2403.500,0,1497.550,0.214,3,Player_3
我发现如果我随机更改此订单,
例如: 1,734.100,398.700,1.522,0.311,2403.500,0,1497.550,0.214,3,Player_3 2,748.564,384.103,1.389,0.395,2354.950,0,1858.400,0.353,5,PLAYER_1 1,726.515,388.121,1.506,0.360,2236.200,0,1431.800,0.364,4,Player_2 2,733.667,387.000,1.241,0.405,2612.450,0,2322.400,0.444,5,Player_2 2,742.300,394.600,1.514,0.388,2530.833,0,1454.000,1.000,1,Player_3 .....
它通常会影响分类器的性能。有人能解释我为什么会这样吗?我使用NaiveBayes,RandomForest和LMT作为分类器。
提前致谢, 拿破仑
答案 0 :(得分:1)
更改CV折叠参数,CV随机种子或数据的顺序将影响分类器的准确性。
在训练分类器之前,根据交叉验证算法将数据随机分配到训练或测试集。因此,更改CV折叠将为训练提供更多或更少的数据,从而导致不同的结果。每次提供不同的值时,更改种子将给出不同的数据分配。同样,如果您重新排序数据并保留种子,则相同的行索引将用于训练,但数据的顺序不同,从而导致不同的结果。