我想忽略我的数据集的一个excel列(Category)中的几个类别。 我不得不删除“apple”(数据集中的一个类别),它已经在代码中完成了。但是如何删除一组类别?我尝试使用list和sets,但都没有用。
例如,我想删除这些类别:
["Mango", "orange", ...]
。
我怎样才能有效地做到这一点?
提前谢谢。
数据集示例:
+----------------------+------------+
| Details | Category |
+----------------------+------------+
| Any raw text1 | Mango |
+----------------------+------------+
| any raw text2 | Apple |
+----------------------+------------+
| any raw text5 | Apple |
+----------------------+------------+
| any raw text7 | Apple |
+----------------------+------------+
| any raw text8 | Mango |
+----------------------+------------+
| Any raw text4 | Berry |
+----------------------+------------+
| any raw text5 | Orange |
+----------------------+------------+
| any raw text6 | Apple |
+----------------------+------------+
我的代码示例:
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
data= pd.read_csv('Mydataset.xls', delimiter='\t',usecols=
['Details','Category'],encoding='utf-8')
target_one=data['Category']
target_list=data['Category'].unique()
data=data[data.Category !="Apple"]
data=data[data.Category !="Mango"]
-----------------------------------
答案 0 :(得分:1)
你需要这样的东西
# list of categories to be removed
category_toremove = ['Apple','Mango','Orange']
# use not operator with isin()
df = df[~df['Category'].isin(category_toremove)]