我创建了一个合并密钥。不幸的是,有一些重复的密钥。但是我需要保留这些行。我在想,对于每组重复键,我都可以将计数1、2、3等添加到每个重复键中,以使它们唯一。
可以推荐一个命令或方法吗?非常感谢。
这些实际上是我真正坚持如何进行操作之前的代码:
#creating a key variable for merging
df['dfkey'] = df['ColA'].map(str) + ' ' + df['ColB'].map(str) + ' ' + df['ColC'].map(str) #creating the key
df['dfkeycount'] = df.groupby('dfkey')['dfkey'].transform('count') #counting the freq of each dfkey ---> to know if they are unique
df['dfkeycountcat'] = df.groupby(['dfkey','Category'])['dfkey'].transform('count') #to count the freq of each dfkey per Category Note: Later, will divide the dataset into Category. Then will merge them side by side (one variable will be renamed based on the category name).
dataunique = df.loc[df['dfkeycountcat'] == 1] #created this subset for those with clean keys. I am actually successful with the merging if only within this dataset.
dataduplicate = df.loc[df['dfkeycountcat'] > 1] #this is the dataset that I want to apply the code for adding a sequence number at the end of the key.
答案 0 :(得分:0)
非常感谢您的答复。能够使用cumcount ...
df['dfkeynew'] = df['dfkey'].map(str) + df.groupby('dfkey').cumcount().map(str)
df['dfkeycountnew'] = df.groupby('dfkeynew')['dfkeynew'].transform('count')
df['dfkeycountnew'].value_counts()
它们现在都是独一无二的。