Question

如果重复值在另一列中具有相同的值，则只需要在该列中保留第一次出现的重复值。我需要用空字符串替换其他重复项。例如，文字列“你好吗”应该只显示一次，因为它与日期列中的日期“ 2016-09-10”相同。

import pandas as pd

data = {'date': ['2016-09-10', '2016-09-10',
                 '2016-09-10', '2016-09-10',
                 '2016-09-12', '2016-09-12',
                 '2016-09-13', '2016-09-13'],
        'text': ['hey how are you', 'hey how are you', 'hey how are you', 'good thanks',
                  'good thanks', 'good thanks', 'good thanks', 'good thanks']}

df = pd.DataFrame(data)

当前输出如下：

date           text
2016-09-10     hey how are you
2016-09-10     hey how are you
2016-09-10     hey how are you
2016-09-10     good thanks
2016-09-12     good thanks

我想要的输出是：

date           text
2016-09-10     hey how are you
2016-09-10     
2016-09-10     
2016-09-10     good thanks
2016-09-12     good thanks

Answer 1

使用DataFrame.duplicated和DataFrame.loc来按条件设置空字符串：

df.loc[df.duplicated(['date','text']), 'text'] = ''

#if only 2 columns
#df.loc[df.duplicated(), 'text'] = ''
print (df)
         date             text
0  2016-09-10  hey how are you
1  2016-09-10                 
2  2016-09-10                 
3  2016-09-10      good thanks
4  2016-09-12      good thanks
5  2016-09-12                 
6  2016-09-13      good thanks
7  2016-09-13

如果在另一列中具有相同的值，则仅保留第一次出现的重复列值

1 个答案: