Correct split of dependent variable values in machine learning?

时间:2018-12-03 13:16:12

标签: python python-3.x machine-learning oversampling

I am making a machine learning model in Python and there are only categorical variables in the data set. I want a precision of minimum 90% (for the value of 1 in the dependent variable).

In the original data (the raw YTD data that I pulled from the database) the ratio of 1 to 0 was 61:39 however this varies. Two months ago the ratio was 75:25. I was not getting the precision I wanted with this data. After some trial and error I realized that if the ratio of 1 to 0 was 85:15 then I am able to get a precision for both 1 and and 0) both above 90%. In other words the predictions for both 1 and 0 were more than 90% correct. Mind you I have not done over sampling or under sampling. I simply deleted some rows with the dependent variable value with 0 to get a ratio of 1:0 as 85:15.

I want to know whether this approach is correct.

Thanks

0 个答案:

没有答案