Question

我想有一个新的数据框，其中的行仅在上一个df中重复。我尝试分配一个新列，如果有重复则为true，然后仅选择为true的行。但是我有0个实体。我确定我在df中有重复项我想在旧数据框中保留第一行，并删除所有其他重复项。具有重复值的列称为“合并”

df=df.assign(
    is_duplicate= lambda d: d.duplicated()
).sort_values('merged').reset_index(drop=True)
df2= df.loc[df['is_duplicate'] == 'True']

Answer 1

我认为您需要boolean indexing，应删除loc：

df[df.duplicated()]

否则您的解决方案不能与.reset_index(drop=True)一起使用，因为然后过滤了另一行，在解决方案之前或之后的排序也应该更好：

df = df.assign(is_duplicate= lambda d: d.duplicated())
df2= df[df['is_duplicate']]

Answer 2

它们不是字符串，而是布尔值，因此请使用：

df2 = df.loc[df['is_duplicate']]