如果文本列相同,我想将日期列中的所有日期更改为最早的日期。
import pandas as pd
df = pd.DataFrame({'text': ['I like python pandas',
'find all function input from help jupyter',
'function input',
'function input',
'function input'],'date': ['March 1st',"March 2nd","March 3rd","March 4th","March 5th"]})
所以3月4日和3月5日,我想更改为3月3日,因为这是最早在文本列中列出“函数输入”的情况。任何帮助将不胜感激。
答案 0 :(得分:1)
您可以按text
分组,然后将结果与原始文件合并。像这样:
new_df = df.set_index('text').join(df.groupby('text').first(), lsuffix='_old')
然后print(new_df)
显示:
date_old date
text
I like python pandas March 1st March 1st
find all function input from help jupyter March 2nd March 2nd
function input March 3rd March 3rd
function input March 4th March 3rd
function input March 5th March 3rd
答案 1 :(得分:1)
您可以做到:
def update_col(col):
col[:] = col.iloc[0]
return col
df['date'] = df.groupby('text').date.apply(update_col)
df
# text date
# 0 I like python pandas March 1st
# 1 find all function input from help jupyter March 2nd
# 2 function input March 3rd
# 3 function input March 3rd
# 4 function input March 3rd
答案 2 :(得分:1)
如何?
df1 = df.drop_duplicates(['text'], keep = 'first')
del df['date']
df2 = pd.merge(df, df1, how = 'left', on = ['text'])
输出:
text date
0 I like python pandas March 1st
1 find all function input from help jupyter March 2nd
2 function input March 3rd
3 function input March 3rd
4 function input March 3rd