我正在努力将德国信用档案中的一些变量从插入符号包转换为因子。使用因素减少了号码。从62到21的变量。
问题是我为“Purpose.X”列的数据摘要得到了不一致的结果:
for (i in 20:30) { print(c(colnames(GermanCredit)[i], length( which(GermanCredit[,i] == 1) ) )) }
[1] "Purpose.NewCar" "234"
[1] "Purpose.UsedCar" "103"
[1] "Purpose.Furniture.Equipment" "181"
[1] "Purpose.Radio.Television" "280"
[1] "Purpose.DomesticAppliance" "12"
[1] "Purpose.Repairs" "22"
[1] "Purpose.Education" "50"
[1] "Purpose.Vacation" "0"
[1] "Purpose.Retraining" "9"
[1] "Purpose.Business" "97"
[1] "Purpose.Other" "12"
并且prop.table的结果是
prop.table(table(Purpose))
NewCar UsedCar Furniture.Equipment Radio.Television
0.234 0.103 0.181 0.280
DomesticAppliance Repairs Education Vacation
0.012 0.022 0.050 0.009
Retraining Business Other
0.097 0.012 0.000
看起来Vacation-Other的结果因某些原因而被轮换。任何帮助,找出为什么不一致的结果将非常感激。感谢。
- 通过使用以下循环获得目的:
pcolnamerepeat = c("CheckingAccountStatus.", "CreditHistory.", "Purpose.", "SavingsAccountBonds.",
"EmploymentDuration.", "Personal.", "OtherDebtorsGuarantors.", "Property.", "OtherInstallmentPlans.", "Housing.", "Job.")
for (i in pcolnamerepeat) {
rpt = grep(i, colnames(GermanCredit))
tempfac <- factor(apply(GermanCredit[,rpt], 1, function(x) which(x == 1)))
levels(tempfac) <- substr(colnames(GermanCredit[,rpt]), nchar(i)+1, nchar(colnames(GermanCredit[,rpt])) )
GermanCredit <- cbind(GermanCredit[-c(rpt)], tempfac)
names(GermanCredit)[length(GermanCredit)] <- substr(i, 1, nchar(i)-1 )
}
attach(GermanCredit) # Makes easy access to the columns