我在R中有一个包含许多不同类型变量的数据集,我试图使用smbinning包来计算信息值。
我使用以下代码:
smbinning.sumiv(Sample,y="flag")
此代码为大多数变量生成IV,但是对于某些变量,“过程”列表示“过多的类别”#4;如下面的输出所示:
Char IV Process
12 relationship NA Too many categories
15 nationality NA Too many categories
22 business_activity NA Too many categories
23 business_activity_group NA Too many categories
25 local_authority NA Too many categories
26 neighbourhood NA Too many categories
例如,如果我看一下business_activity_group的值,我可以看到它可能没有太多可能的值:
Affordable Rent Combined Commercial Community Combined
2546 4
Freeholders Combined Garages
23 6
General Needs Combined Keyworker
57140 340
Leasehold Combined Market Rented Combined
88 1463
Older Persons Combined Rent To Homebuy
4774 76
Shared Ownership Combined Staff Acommodation Combined
167 5
Supported Combined
2892
我认为这可能是由于某些类别的音量较低所以我尝试将一些组合在一起。这并没有改变结果。
任何人都可以解释为什么'太多类别'发生了,我可以对这些变量做些什么来从smbinning包中产生IV?