I'm a newbie at python and data science and I'm trying to run a multilabel classification. However, I have over 2.000.000 observations and 230 categories to predict. The main problem here is that my sparse matrix will result in a lot of "zeroes", so the accuracy will be monstrously high (classifying everything as 0).
For example, the category "animals" appears 11340 times. So, there will be over 1,9m "0" in this category.
Is there a way to reduce this effect? I used binary relevance, naive Bayes and some others but i think the main issue is the data frame itself.