删除数据框中的行数

时间:2018-12-01 18:39:01

标签: python python-3.x pandas dataframe

我有一个数据框包含25000行和两列(text,class) 类包含许多[A,B,C]

data = pd.read_csv('E:\mydata.txt', sep="*")
data.columns = ["text", "class"]

例如,我需要删除A类的10行,B类的15行

2 个答案:

答案 0 :(得分:1)

您可以通过条件切片和数据框的index属性来实现此目的

remove_n = 10
remove_class = 1
# Here you first find the indexes where class is equal to the class you want to drop.
#Then you slice only the first n indexes of this class
index_to_drop = data.index[data['class'] == remove_class][:remove_n]
#Finally drop those indexes
data = data.drop(index_to_drop)

答案 1 :(得分:0)

您可以通过np.logical_andgroupby.cumcount构造一个布尔掩码。然后通过iloc将其应用于您的数据框:

# example dataframe
df = pd.DataFrame({'group': np.random.randint(0, 3, 100),
                   'value': np.random.random(100)})

print(df.shape)  # (100, 2)

# criteria input
criteria = {1: 10, 2: 15}

# cumulative count by group
cum_count = df.groupby('group').cumcount()

# Boolean mask, negative via ~
conditions = [(df['group'].eq(k) & cum_count.lt(v)) for k, v in criteria.items()]
mask = ~np.logical_or.reduce(conditions)

# apply Boolean mask
res = df.iloc[mask]

print(res.shape)  # (75, 2)