Question

我有这个数据框：

  Ubicacion       lat       lon
0         a  19.28034 -99.17121
1         b  19.28333 -99.17535
2         c  19.28028 -99.16887
3         a  19.28034 -99.17121
4         b  19.28333 -99.17535
5         c  19.28028 -99.16887
6         b  19.28333 -99.17535
7         d  19.29259 -99.17757
8         d  19.29259 -99.17757
9         d  19.29259 -99.17757

我想删除所有重复的行，所以我使用：

ubicaciones_finales = ubicaciones_finales.drop_duplicates(keep="first")

我明白了：

  Ubicacion       lat       lon
0         a  19.28034 -99.17121
1         b  19.28333 -99.17535
2         c  19.28028 -99.16887
7         d  19.29259 -99.17757

除了行依次为 0、1、2 和 7 之外，一切似乎都很好。所以当我运行时：

 for k, row in ubicaciones_finales.iterrows():
    print(k)

I get:
0
1
2
7

我该如何解决这个问题？顺便说一句，已经检查 pandas documentation

df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

和它一样，从 0 到 2 没有 1。谢谢你的时间。

Answer 1

IIUC，使用 reset_index 或直接通过 ignore_index=True：

df = df.drop_duplicates(keep='first').reset_index(drop=True)

# or 

df = df.drop_duplicates(keep='first', ignore_index=True)

输出：

  Ubicacion       lat       lon
0         a  19.28034 -99.17121
1         b  19.28333 -99.17535
2         c  19.28028 -99.16887
3         d  19.29259 -99.17757

在熊猫数据框中删除重复项的问题

1 个答案: