Weka - 有一种很好的方法可以处理(很多)数字属性来对名义值进行分类吗?

时间:2014-12-06 19:30:12

标签: weka

最佳

我有很多数值,最后,我想预测一个结果。 我的结果可以具有“0' 0' 1'的标称值或者' x'。

我想知道的是,我怎样才能获得最佳效果。 某些分类器可以比另一个更好地处理数字属性吗? 有时似乎分类器专注于一个不太有趣的属性...

此刻h。意味着主队和a。意味着远离团队。如果我拆分它并添加属性,位置@location {' h',' a'} - >会更好吗? 0将变为1,反之亦然

@relation estimation
@attribute h.teamSize numeric
@attribute h.lineUpTeamFormation {'5-2-0-3-1' ... '6-2-0-4-1'}
@attribute h.teamRatingAVG numeric
@attribute h.teamRatingHighest numeric
@attribute h.teamRatingLowest numeric
@attribute h.teamRatingMed numeric
@attribute h.teamRatingMedRating numeric
@attribute h.lineUpTeamRating.att numeric
@attribute h.lineUpTeamRating.attMid numeric
@attribute h.lineUpTeamRating.mid numeric
@attribute h.lineUpTeamRating.defMid numeric
@attribute h.lineUpTeamRating.def numeric
@attribute h.lineUpTeamRatingAVG.att numeric
@attribute h.lineUpTeamRatingAVG.attMid numeric
@attribute h.lineUpTeamRatingAVG.mid numeric
@attribute h.lineUpTeamRatingAVG.defMid numeric
@attribute h.lineUpTeamRatingAVG.def numeric
@attribute h.lineUpTeamRatingHighest.att numeric
@attribute h.lineUpTeamRatingHighest.attMid numeric
@attribute h.lineUpTeamRatingHighest.mid numeric
@attribute h.lineUpTeamRatingHighest.defMid numeric
@attribute h.lineUpTeamRatingHighest.def numeric
@attribute h.lineUpTeamRatingLowest.att numeric
@attribute h.lineUpTeamRatingLowest.attMid numeric
@attribute h.lineUpTeamRatingLowest.mid numeric
@attribute h.lineUpTeamRatingLowest.defMid numeric
@attribute h.lineUpTeamRatingLowest.def numeric
@attribute a.teamSize numeric
@attribute a.lineUpTeamFormation {'5-2-0-3-1' ... '6-2-0-4-1'}
@attribute a.teamRatingAVG numeric
@attribute a.teamRatingHighest numeric
@attribute a.teamRatingLowest numeric
@attribute a.teamRatingMed numeric
@attribute a.teamRatingMedRating numeric
@attribute a.lineUpTeamRating.att numeric
@attribute a.lineUpTeamRating.attMid numeric
@attribute a.lineUpTeamRating.mid numeric
@attribute a.lineUpTeamRating.defMid numeric
@attribute a.lineUpTeamRating.def numeric
@attribute a.lineUpTeamRatingAVG.att numeric
@attribute a.lineUpTeamRatingAVG.attMid numeric
@attribute a.lineUpTeamRatingAVG.mid numeric
@attribute a.lineUpTeamRatingAVG.defMid numeric
@attribute a.lineUpTeamRatingAVG.def numeric
@attribute a.lineUpTeamRatingHighest.att numeric
@attribute a.lineUpTeamRatingHighest.attMid numeric
@attribute a.lineUpTeamRatingHighest.mid numeric
@attribute a.lineUpTeamRatingHighest.defMid numeric
@attribute a.lineUpTeamRatingHighest.def numeric
@attribute a.lineUpTeamRatingLowest.att numeric
@attribute a.lineUpTeamRatingLowest.attMid numeric
@attribute a.lineUpTeamRatingLowest.mid numeric
@attribute a.lineUpTeamRatingLowest.defMid numeric
@attribute a.lineUpTeamRatingLowest.def numeric
@attribute result {'0','1','x'}
@data
11.0,"4-1-1-4-1",1563.0046902930617,1716.018383910481,1493.642106150469,1542.5395864396032,1604.830245030475,1594.8952627985404,6230.782838756112,1552.485746007047,1716.018383910481,6098.869361751494,1594.8952627985404,1557.695709689028,1552.485746007047,1716.018383910481,1524.7173404378734,1594.8952627985404,1617.8284702417561,1552.485746007047,1716.018383910481,1542.4611979096933,1594.8952627985404,1493.642106150469,1552.485746007047,1716.018383910481,1510.4250125761928,11.0,"5-1-1-2-2",1588.961662996073,1747.6289170494754,1508.4062919834894,1565.5233012334515,1628.0176045164824,3459.80148294728,3079.552081457912,1542.4682316024448,1576.1754548839763,7820.5810420651915,1729.90074147364,1539.776040728956,1542.4682316024448,1576.1754548839763,1564.1162084130383,1747.6289170494754,1549.4953619285486,1542.4682316024448,1576.1754548839763,1613.8600439857894,1712.1725658978046,1530.0567195293636,1542.4682316024448,1576.1754548839763,1508.4062919834894,"x"
11.0,"4-2-2-2-1",1475.8094913912312,1502.0682887709222,1444.990021885439,1483.7603435487183,1473.5291553281807,1490.639636207262,2978.5093856157946,2950.4346148352724,2892.2037554297044,5922.117013215507,1490.639636207262,1489.2546928078973,1475.2173074176362,1446.1018777148522,1480.5292533038767,1490.639636207262,1492.9037337533382,1502.0682887709222,1447.2137335442653,1496.2886114276891,1490.639636207262,1485.6056518624566,1448.3663260643502,1444.990021885439,1460.927921231502,11.0,"4-1-2-2-2",1484.7390000692892,1512.2300048742143,1453.444107111614,1486.4669707831615,1482.837055992914,3013.771836727523,2964.5776806684476,2961.501146916992,1453.444107111614,5938.834229337606,1506.8859183637614,1482.2888403342238,1480.750573458496,1453.444107111614,1484.7085573344016,1512.2300048742143,1501.9409533482967,1493.2838448180084,1453.444107111614,1502.7776443004382,1501.5418318533088,1462.6367273201508,1468.2173020989835,1453.444107111614,1464.7837448131381,"1"
11.0,"6-0-1-2-2",1445.77970697302,1506.5657818615387,1393.7116666209088,1430.4622334716257,1450.1387242412238,2937.7942649521,3010.9183806060323,1402.8170557672368,0.0,8552.047075377852,1468.89713247605,1505.4591903030162,1402.8170557672368,NaN,1425.341179229642,1483.5459383871223,1506.5657818615387,1402.8170557672368,-1.0,1465.0738948215799,1454.248326564978,1504.3525987444937,1402.8170557672368,2.147483647E9,1393.7116666209088,11.0,"4-2-2-2-1",1430.4629022453128,1474.4893525633652,1404.2919287564614,1426.6619540429597,1439.3906406599133,1404.2919287564614,2864.6817220202643,2906.4018234232753,2831.550186683904,5728.166263814535,1404.2919287564614,1432.3408610101321,1453.2009117116377,1415.775093341952,1432.0415659536338,1404.2919287564614,1452.1579439472125,1474.4893525633652,1426.6619540429597,1458.4115214984754,1404.2919287564614,1412.5237780730517,1431.9124708599102,1404.8882326409444,1413.8219682802633,"x"
11.0,"6-1-1-2-1",1455.2875865157116,1533.8148260877508,1408.8080092768812,1454.6219157957269,1471.311417682316,1440.5588774260157,2975.472084744947,1454.6219157957269,1489.241573073469,8648.269000632668,1440.5588774260157,1487.7360423724735,1454.6219157957269,1489.241573073469,1441.3781667721114,1440.5588774260157,1533.8148260877508,1454.6219157957269,1489.241573073469,1475.4245410744663,1440.5588774260157,1441.6572586571963,1454.6219157957269,1489.241573073469,1408.8080092768812,11.0,"7-1-1-1-1",1478.6812699237746,1573.5345947486803,1376.2807543215677,1487.4841795952277,1474.907674535124,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,10366.36616650411,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1480.90945235773,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1501.6224047599273,1573.5345947486803,1438.3659332206364,1510.946520366525,1376.2807543215677,1421.1718685458247,"0"
...

我希望有经验的人可以给我一些建议。 因此:

  • 处理数字数据的好方法
  • 处理大量属性的好方法

[我知道这不是最好的方式,但我已经很开心了:)

亲切的问候

1 个答案:

答案 0 :(得分:0)

试验和错误可能是确定“最佳”分类器的最佳方法。它实际上归结为许多因素,例如数据的布局和预处理,数据量以及问题对分类器的适应性。

快速浏览一下,您可以试用J48,神经网络或SVM。可能需要改变的唯一部分是编队属性(可能将它们分成5个属性?)。除此之外,许多分类器将能够根据提供的数字信息预测标称输出。

至于主场vs客场部分,它看起来很好看,并且最好省略额外的属性。这些类型的问题通常有利于主队,但你似乎已经知道谁是家,谁不在,所以它不应该真正增加模型。

玩一下可用的东西,看看你怎么走。结果可能让你感到惊讶!