Question

我正在努力将德国信用档案中的一些变量从插入符号包转换为因子。使用因素减少了号码。从62到21的变量。

问题是我为“Purpose.X”列的数据摘要得到了不一致的结果：

for (i in 20:30) { print(c(colnames(GermanCredit)[i], length( which(GermanCredit[,i] == 1) ) )) }
[1] "Purpose.NewCar" "234"           
[1] "Purpose.UsedCar" "103"            
[1] "Purpose.Furniture.Equipment" "181"                        
[1] "Purpose.Radio.Television" "280"                     
[1] "Purpose.DomesticAppliance" "12"                       
[1] "Purpose.Repairs" "22"             
[1] "Purpose.Education" "50"               
[1] "Purpose.Vacation" "0"               
[1] "Purpose.Retraining" "9"                 
[1] "Purpose.Business" "97"              
[1] "Purpose.Other" "12"

并且prop.table的结果是

prop.table(table(Purpose))
NewCar             UsedCar Furniture.Equipment    Radio.Television
0.234               0.103               0.181               0.280
DomesticAppliance    Repairs  Education            Vacation
0.012                 0.022     0.050               0.009
Retraining            Business          Other
0.097               0.012               0.000

看起来Vacation-Other的结果因某些原因而被轮换。任何帮助，找出为什么不一致的结果将非常感激。感谢。

- 通过使用以下循环获得目的：

pcolnamerepeat = c("CheckingAccountStatus.", "CreditHistory.", "Purpose.", "SavingsAccountBonds.", 
    "EmploymentDuration.", "Personal.", "OtherDebtorsGuarantors.", "Property.", "OtherInstallmentPlans.", "Housing.", "Job.")
for (i in pcolnamerepeat) { 
    rpt = grep(i, colnames(GermanCredit))

    tempfac <- factor(apply(GermanCredit[,rpt], 1, function(x) which(x == 1)))

levels(tempfac) <- substr(colnames(GermanCredit[,rpt]), nchar(i)+1, nchar(colnames(GermanCredit[,rpt])) )

GermanCredit <- cbind(GermanCredit[-c(rpt)], tempfac)

names(GermanCredit)[length(GermanCredit)] <- substr(i, 1, nchar(i)-1 )
}
attach(GermanCredit) # Makes easy access to the columns

R：来自插入包的GermanCredit数据的'which'和'prop.table'的结果不一致

0 个答案: