Question

抱歉我不熟悉python和pandas这个愚蠢的问题。

想象一下，我有一个csv文件，其中包含每行的一些数据，例如：

data1, data2, data3, data4

没有标题，只有数据，如果

，我需要删除此类文件中的一些行

(row1.data3 and row1.data4) == (row2.data3 and row2.data4)

删除整行。

我怎样才能做到这一点？

我确实尝试使用remove_duplicates但没有标题我不知道该怎么做。

欢呼声

Answer 1

假设您碰巧有for (char singleChar : lstring.getCharArray()) {没有标题：

df

然后，您可以df = pd.read_csv("./try.csv", header=None) df # The first row is integers inserted instead of missing column names 0 1 2 0 1 1 1 1 1 1 1 2 2 1 3 3 2 1 3 4 3 2 3 5 3 3 3列的子集：

drop_duplicates

或

df.drop_duplicates([0])
    0   1   2
0   1   1   1
2   2   1   3
4   3   2   3

不要忘记将结果分配给新变量或添加df.drop_duplicates([0,1]) 0 1 2 0 1 1 1 2 2 1 3 4 3 2 3 5 3 3 3

Python Pandas在csv文件中删除重复没有标题

1 个答案: