如何缩放数据集中每个国家的堆叠模型方法?

时间:2018-07-23 03:42:03

标签: r machine-learning statistics ensemble-learning

fitControl <- trainControl(
  method = "cv",
  number = 5,
  savePredictions = 'final',
  classProbs = F)

predictors<-c("Age", "Quantile","label1","label2")
outcomeName<-'Life_expt'

model_rf<-train(Life_expt ~ Age+Quantile+label1+label2,Train2[country==.BY],method='rf',trControl=fitControl,tuneLength=3)
  

.prepareFastSubset(isub = isub,x = x,enclos = parent.frame(),中的错误:     ==的RHS是长度0,而不是1或nrow(559)。为了坚固起见,不允许回收(长度1 RHS除外)。考虑改用%in%。

我正在尝试将其扩展到每个国家/地区。我想使用堆叠方法和模型(rf,svmRadial,glm)。我该如何在每个国家/地区都做到无错?

谢谢

1 个答案:

答案 0 :(得分:0)

您可以使用factor函数在您所在的国家/地区按by进行缩放:

# Data frame simulation
set.seed(123)
source <- data.frame(Country = factor(rep(c("Morocco", "Egypt", "Somali"), each = 4)),
                 Age = sample(18:100, 12, replace = TRUE),
                 Quantile = sample(1:100 / 100, 12, replace = TRUE))

# Apply scale for each group of rows with the same value of factor (country)
df_lists <- by(data = source, INDICES = source$Country, FUN = function(x){
  x$Scaled <- scale(x[, c(2, 3)])
  x
  }
)

# Combine data frame from the list
Train2 <- do.call(rbind, df_lists)
Train2

输出:

          Country Age Quantile  Scaled.Age Scaled.Quantile
Egypt.5     Egypt  42     0.15 -0.68003487     -1.47468600
Egypt.6     Egypt  30     0.42 -1.03102061      0.62092042
Egypt.7     Egypt  97     0.42  0.92864977      0.62092042
Egypt.8     Egypt  92     0.37  0.78240571      0.23284516
Morocco.1 Morocco  72     0.76  0.43965779      1.47808959
Morocco.2 Morocco  76     0.22  1.14311024     -0.65035942
Morocco.3 Morocco  63     0.32 -1.14311024     -0.25620220
Morocco.4 Morocco  67     0.24 -0.43965779     -0.57152797
Somali.9   Somali  75     0.16  0.56497964     -0.61136845
Somali.10  Somali  84     0.14  0.88278069     -0.74355623
Somali.11  Somali  20     0.24 -1.37713788     -0.08261736
Somali.12  Somali  57     0.47 -0.07062246      1.43754204