如果重复值在另一列中具有相同的值,则只需要在该列中保留第一次出现的重复值。我需要用空字符串替换其他重复项。例如,文字列“你好吗”应该只显示一次,因为它与日期列中的日期“ 2016-09-10”相同。
import pandas as pd
data = {'date': ['2016-09-10', '2016-09-10',
'2016-09-10', '2016-09-10',
'2016-09-12', '2016-09-12',
'2016-09-13', '2016-09-13'],
'text': ['hey how are you', 'hey how are you', 'hey how are you', 'good thanks',
'good thanks', 'good thanks', 'good thanks', 'good thanks']}
df = pd.DataFrame(data)
当前输出如下:
date text
2016-09-10 hey how are you
2016-09-10 hey how are you
2016-09-10 hey how are you
2016-09-10 good thanks
2016-09-12 good thanks
我想要的输出是:
date text
2016-09-10 hey how are you
2016-09-10
2016-09-10
2016-09-10 good thanks
2016-09-12 good thanks
答案 0 :(得分:4)
使用DataFrame.duplicated
和DataFrame.loc
来按条件设置空字符串:
df.loc[df.duplicated(['date','text']), 'text'] = ''
#if only 2 columns
#df.loc[df.duplicated(), 'text'] = ''
print (df)
date text
0 2016-09-10 hey how are you
1 2016-09-10
2 2016-09-10
3 2016-09-10 good thanks
4 2016-09-12 good thanks
5 2016-09-12
6 2016-09-13 good thanks
7 2016-09-13