为什么gain.ratio给NaN?

时间:2016-05-17 12:45:55

标签: r chi-squared information-gain

我正在尝试检查要素选择的属性,为此我应用了information.gain,gain.ratio和chi-squared但是有些属性给出的是NaN值或0.0000000。

> weights <- information.gain(Team1.Result~., df)
> print(weights)
               attr_importance
Sr..No.            0.000000000
Matchid            0.000000000
Team2              0.171564805
Margin             0.344871508
Toss               0.004552660
Bat                0.006355032
Ground             0.324758562
Date               0.674699370
Team1.BatRate      0.000000000
Team1.Bat_SR       0.000000000
Team1.BowlRate     0.144960767
Team1.Bowl_SR      0.000000000
Team2.BatRate      0.000000000
Team2.Bat_SR       0.000000000
Team2.BowlRate     0.161264860
Team2.Bowl_SR      0.161264860

增益比率是

> weights <- gain.ratio(Team1.Result~., df)
> print(weights)
               attr_importance
Sr..No.                    NaN
Matchid                    NaN
Team2              0.075884914
Margin             0.107668123
Toss               0.006675310
Bat                0.009171368
Ground             0.133481349
Date               0.175239871
Team1.BatRate              NaN
Team1.Bat_SR               NaN
Team1.BowlRate     0.266415653
Team1.Bowl_SR              NaN
Team2.BatRate              NaN
Team2.Bat_SR               NaN
Team2.BowlRate     0.283865166
Team2.Bowl_SR      0.283865166

卡方给出了

> res <- chi.squared(Team1.Result~., df)
> res
               attr_importance
Sr..No.              0.0000000
Matchid              0.0000000
Team2                0.5168656
Margin               0.7149496
Toss                 0.0951519
Bat                  0.1125653
Ground               0.7022298
Date                 1.0000000
Team1.BatRate        0.0000000
Team1.Bat_SR         0.0000000
Team1.BowlRate       0.4553474
Team1.Bowl_SR        0.0000000
Team2.BatRate        0.0000000
Team2.Bat_SR         0.0000000
Team2.BowlRate       0.4823412
Team2.Bowl_SR        0.4823412

显示数据的一些记录(我想添加图片,但网站不允许我这样做)

   Sr. No.  Matchid Team2   Margin  BR  Toss    Bat Ground  Date    Team1.BatRate   Team1.Bat_SR    Team1.BowlRate  Team1.Bowl_SR   Team2.BatRate   Team2.Bat_SR    Team2.BowlRate  Team2.Bowl_SR   Team1.Result
1   533280  New Zealand 13 runs NA  1   1   Pallekele   23-Sep-12   18.96866667 114.3413333 20.67066667 15.27333333 17.10866667 111.3693333 13.97666667 12.14666667 1
2   533283  Bangladesh  8 wickets   8   0   2   Pallekele   25-Sep-12   14.41333333 111.9113333 23.82466667 17.00666667 17.10866667 111.3693333 13.97666667 12.14666667 1
3   533286  South Africa    2 wickets   2   0   2   Colombo (RPS)   28-Sep-12   17.10866667 111.3693333 13.97666667 12.14666667 21.862  116.5413333 21.29266667 15.46   1
4   533291  India   8 wickets   18  1   1   Colombo (RPS)   30-Sep-12   22.37   104.772 25.52333333 19.29333333 17.10866667 111.3693333 13.97666667 12.14666667 0
5   533294  Australia   32 runs NA  0   1   Colombo (RPS)   2-Oct-12    18.36066667 114.2273333 22.80333333 18.42   17.10866667 111.3693333 13.97666667 12.14666667 1
6   533296  Sri Lanka   16 runs NA  0   2   Colombo (RPS)   4-Oct-12    17.10866667 111.3693333 13.97666667 12.14666667 15.936  100.616 15.75333333 13.16   0
7   562438  Sri Lanka   23 runs NA  1   1   Hambantota  3-Jun-12    14.425  98.111875   11.86875    10.33125    17.51142857 105.8635714 16.23214286 12.87857143 1

结果是否可以使用NaN,因为它对我来说似乎不对。也可以将一个属性设为1,就像卡宾的日期一样?

0 个答案:

没有答案