Why do ROSE or SMOTE function changing numeric values?

时间:2018-12-27 13:00:41

标签: r modeling oversampling

ROSE and SMOTE function are changing numeric values and I don't know why.

I'm trying to predict hospital readmission in diabetic inpatients and the dataset is highly imbalanced. So I want to use oversampling method to balance my dataset - I've already tried ROSE and SMOTE function.

My dataset contains only numeric values (dummy variables) as I wanted to apply xgboost. But I notice that ROSE and SMOTE are generating non-integer values within variable that is binary.
Should those funtions change original values?

set.seed(1994)
d_split <- initial_split(d.fin, prop = .8)
train <- as.data.table(training(d_split))
test  <- as.data.table(testing(d_split))

data.rose <- as.data.table(ROSE(readmitted~., data=train, seed=3)$data)

Original dataset

values of "insulin" binary variable in original dataset

After applying ROSE function

values of "insulin" binary variable after applying rose function

0 个答案:

没有答案