Question

我想忽略我的数据集的一个excel列（Category）中的几个类别。我不得不删除“apple”（数据集中的一个类别），它已经在代码中完成了。但是如何删除一组类别？我尝试使用list和sets，但都没有用。

例如，我想删除这些类别： ["Mango", "orange", ...]。我怎样才能有效地做到这一点？提前谢谢。

数据集示例：

+----------------------+------------+
| Details              | Category   |
+----------------------+------------+
| Any raw text1        | Mango      |
+----------------------+------------+
| any raw text2        | Apple      |
+----------------------+------------+
| any raw text5        | Apple      |
+----------------------+------------+
| any raw text7        | Apple      |
+----------------------+------------+
| any raw text8        | Mango      |
+----------------------+------------+
| Any raw text4        | Berry      |
+----------------------+------------+
| any raw text5        | Orange     |
+----------------------+------------+
| any raw text6        | Apple      |
+----------------------+------------+

我的代码示例：

import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt  

data= pd.read_csv('Mydataset.xls', delimiter='\t',usecols=
['Details','Category'],encoding='utf-8')

target_one=data['Category']
target_list=data['Category'].unique()    

data=data[data.Category !="Apple"]
data=data[data.Category !="Mango"]
-----------------------------------

Answer 1

你需要这样的东西

# list of categories to be removed
category_toremove = ['Apple','Mango','Orange']

# use not operator with isin()
df = df[~df['Category'].isin(category_toremove)]

忽略一组类别

1 个答案: