忽略一组类别

时间:2018-02-11 18:11:38

标签: python-3.x pandas scikit-learn

我想忽略我的数据集的一个excel列(Category)中的几个类别。 我不得不删除“apple”(数据集中的一个类别),它已经在代码中完成了。但是如何删除一组类别?我尝试使用list和sets,但都没有用。

例如,我想删除这些类别: ["Mango", "orange", ...]。 我怎样才能有效地做到这一点? 提前谢谢。

数据集示例:

+----------------------+------------+
| Details              | Category   |
+----------------------+------------+
| Any raw text1        | Mango      |
+----------------------+------------+
| any raw text2        | Apple      |
+----------------------+------------+
| any raw text5        | Apple      |
+----------------------+------------+
| any raw text7        | Apple      |
+----------------------+------------+
| any raw text8        | Mango      |
+----------------------+------------+
| Any raw text4        | Berry      |
+----------------------+------------+
| any raw text5        | Orange     |
+----------------------+------------+
| any raw text6        | Apple      |
+----------------------+------------+

我的代码示例:

import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt  

data= pd.read_csv('Mydataset.xls', delimiter='\t',usecols=
['Details','Category'],encoding='utf-8')

target_one=data['Category']
target_list=data['Category'].unique()    

data=data[data.Category !="Apple"]
data=data[data.Category !="Mango"]
-----------------------------------

1 个答案:

答案 0 :(得分:1)

你需要这样的东西

# list of categories to be removed
category_toremove = ['Apple','Mango','Orange']

# use not operator with isin()
df = df[~df['Category'].isin(category_toremove)]