Question

我正在尝试删除重复的行并使用pandas记录出现的次数。以下语句是我尝试的：

createModel['count'] = createModel.groupby(createModel.columns.tolist(),as_index=False).size()
createModel.to_csv(r"test1.csv",index=False,header =True,sep="\t",encoding="utf-16")
createModel.head(10)

但是我遇到了错误：TypeError: incompatible index of inserted column with frame index

我知道这是因为我添加了count。如果删除它，然后尝试保存文件，则只能保护count。

请让我知道如何保存完整的数据框，而无需重复，并且将列添加为count，这意味着该行出现的次数。

Answer 1

将transform用于新列，但有必要在groupby的{{1}}之后添加一个列值：

[]

如果需要删除重复值，则需要聚合值并添加reset_index：

cols = createModel.columns.tolist()
#another solution, thanks @jpp
#cols = list(createModel)
createModel['count'] = createModel.groupby(cols)[cols[0]].transform('size')

将计数添加到新列pandas python 3

1 个答案: