我需要帮助转动数据框
data = [{'Start Date': '12/3/2016',
'End Date': '12/4/2016',
'Name':'John'},
{'Start Date':'12/3/2016',
'End Date': '12/4/2016',
'Name':'Karen'},
{'Start Date': '12/1/2016',
'End Date': '12/2/2016',
'Name':'John'},
{'Start Date':'12/1/2016',
'End Date': '12/2/2016',
'Name':None},
{'Start Date': '12/5/2016',
'End Date': '12/6/2016',
'Name':'Jeff'},
{'Start Date':'12/5/2016',
'End Date': '12/6/2016',
'Name':'John'}]
df = pd.DataFrame(data)
df
我需要它看起来像this。只要它们被列出,名称最终会出现在哪个Person列中并不重要。
答案 0 :(得分:0)
考虑使用数据框本身的合并方法(类似于SQL的自连接),然后过滤非匹配的名称并首先与组计数配对:
mdf = pd.merge(df, df, on='End Date')
mdf['grp'] = mdf.groupby('End Date').cumcount()
# End Date Name_x Start Date_x Name_y Start Date_y grp
# 0 12/4/2016 John 12/3/2016 John 12/3/2016 0
# 1 12/4/2016 John 12/3/2016 Karen 12/3/2016 1
# 2 12/4/2016 Karen 12/3/2016 John 12/3/2016 2
# 3 12/4/2016 Karen 12/3/2016 Karen 12/3/2016 3
# 4 12/2/2016 John 12/1/2016 John 12/1/2016 0
# 5 12/2/2016 John 12/1/2016 None 12/1/2016 1
# 6 12/2/2016 None 12/1/2016 John 12/1/2016 2
# 7 12/2/2016 None 12/1/2016 None 12/1/2016 3
# 8 12/6/2016 Jeff 12/5/2016 Jeff 12/5/2016 0
# 9 12/6/2016 Jeff 12/5/2016 John 12/5/2016 1
# 10 12/6/2016 John 12/5/2016 Jeff 12/5/2016 2
# 11 12/6/2016 John 12/5/2016 John 12/5/2016 3
mdf = mdf[(mdf['Name_x'] != mdf['Name_y']) & (mdf['grp']==1)] # FILTER ROWS
# End Date Name_x Start Date_x Name_y Start Date_y grp
# 1 12/4/2016 John 12/3/2016 Karen 12/3/2016 1
# 5 12/2/2016 John 12/1/2016 None 12/1/2016 1
# 9 12/6/2016 Jeff 12/5/2016 John 12/5/2016 1
mdf = mdf[['End Date', 'Name_x', 'Name_y', 'Start Date_x']].\
sort_values(['End Date']).reset_index(drop=True) # SUBSET COLUMNS
mdf.columns = ['End Date', 'Person 1', 'Person 2', 'Start Date'] # RE-NAME COLUMNS
# End Date Person 1 Person 2 Start Date
# 0 12/2/2016 John None 12/1/2016
# 1 12/4/2016 John Karen 12/3/2016
# 2 12/6/2016 Jeff John 12/5/2016