我有一个数据框包含25000行和两列(text,class) 类包含许多[A,B,C]
data = pd.read_csv('E:\mydata.txt', sep="*")
data.columns = ["text", "class"]
例如,我需要删除A类的10行,B类的15行
答案 0 :(得分:1)
您可以通过条件切片和数据框的index属性来实现此目的
remove_n = 10
remove_class = 1
# Here you first find the indexes where class is equal to the class you want to drop.
#Then you slice only the first n indexes of this class
index_to_drop = data.index[data['class'] == remove_class][:remove_n]
#Finally drop those indexes
data = data.drop(index_to_drop)
答案 1 :(得分:0)
您可以通过np.logical_and
和groupby.cumcount
构造一个布尔掩码。然后通过iloc
将其应用于您的数据框:
# example dataframe
df = pd.DataFrame({'group': np.random.randint(0, 3, 100),
'value': np.random.random(100)})
print(df.shape) # (100, 2)
# criteria input
criteria = {1: 10, 2: 15}
# cumulative count by group
cum_count = df.groupby('group').cumcount()
# Boolean mask, negative via ~
conditions = [(df['group'].eq(k) & cum_count.lt(v)) for k, v in criteria.items()]
mask = ~np.logical_or.reduce(conditions)
# apply Boolean mask
res = df.iloc[mask]
print(res.shape) # (75, 2)