我在df1和df2中有一些数据。基于df1中的Interval
列值,我想从df2中取出与df1中的间隔值匹配的特定Start
和End
。
df1:
ID Interval
1 annual
2 quarterly
3 semiannual
df2:
ID Start End
1 AUG-FY21 JAN-FY22
1 AUG-FY21 OCT-FY21
1 AUG-FY21 JUL-FY22
2 AUG-FY21 JAN-FY22
2 AUG-FY21 OCT-FY21
3 AUG-FY21 JAN-FY22
3 AUG-FY21 OCT-FY21
3 AUG-FY21 JUL-FY22
output:
ID Interval Start End
1 annual AUG-FY21 JUL-FY22
2 quarterly AUG-FY21 OCT-FY21
3 semiannual AUG-FY21 JAN-FY22
答案 0 :(得分:0)
在计算天数差异后使用熊猫合并两个数据框的解决方案,并随意定义间隔标签。
# reproduce the test case
import pandas as pd
data_1 = {'ID': [1, 2, 3],
'Interval': ['annual', 'quarterly', 'semiannual']}
df1 = pd.DataFrame(data_1)
data_2 = {'ID': [1, 1, 1, 2, 2, 3, 3, 3],
'Start': ['AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21', 'AUG-FY21'],
'End': ['JAN-FY21', 'OCT-FY21', 'AUG-FY22', 'JAN-FY21', 'OCT-FY21', 'JAN-FY21', 'OCT-FY21', 'AUG-FY22']}
df2 = pd.DataFrame(data_2)
# compute the days interval based on start and stop
df2['Days_interval'] = (pd.to_datetime(df2.End.str.replace('-FY', ' 20')) - pd.to_datetime(df2.Start.str.replace('-FY', ' 20'))).abs().dt.days
df2['Interval'] = ''
# assign labels based on days interval
df2.loc[df2['Days_interval'] < 100, 'Interval'] = 'quarterly'
df2.loc[(df2['Days_interval'] >= 100) & (df2['Days_interval'] <= 300), 'Interval'] = 'semiannual'
df2.loc[df2['Days_interval'] > 300, 'Interval'] = 'annual'
# exclude helper columns
df2.drop('Days_interval', axis = 1, inplace = True)
# merge both dfs by ID and interval
output = pd.merge(df1, df2, how='inner', on = ['ID', 'Interval'])
# exclude helper columns from original df
df2.drop('Interval', axis = 1, inplace = True)
output
ID Interval Start End
0 1 annual AUG-FY21 AUG-FY22
1 2 quarterly AUG-FY21 OCT-FY21
2 3 semiannual AUG-FY21 JAN-FY21
答案 1 :(得分:0)
您可以将Start
和End
列转换为日期,获取它们之间的月份数,然后使用词典用所需的单词替换timedelta。合并日期时间并将其转换回字符串。
import pandas as pd
offsets= {11:'annual',
2:'quarterly',
5:'semiannual'}
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Interval': ['annual', 'quarterly', 'semiannual']})
df2 = pd.DataFrame({'ID': [1, 1, 1, 2, 2, 3, 3, 3],
'Start': ['AUG-FY21','AUG-FY21','AUG-FY21','AUG-FY21','AUG-FY21','AUG-FY21','AUG-FY21','AUG-FY21'],
'End': ['JAN-FY22','OCT-FY21','JUL-FY22','JAN-FY22','OCT-FY21','JAN-FY22','OCT-FY21','JUL-FY22']})
df2['Start'] =pd.to_datetime(df2['Start'], format='%b-FY%y')
df2['End'] =pd.to_datetime(df2['End'], format='%b-FY%y')
df2['Interval'] = df2.apply(lambda x: len(pd.date_range(start=x['Start'], end=x['End'], freq='M')), axis=1)
df2['Interval'] = df2['Interval'].replace(offsets)
output = df1.merge(df2, on=['ID','Interval'], how='left')
output['Start'] = output['Start'].dt.strftime(date_format='%b-FY%y').str.upper()
output['End'] = output['End'].dt.strftime(date_format='%b-FY%y').str.upper()
输出
ID Interval Start End
0 1 annual AUG-FY21 JUL-FY22
1 2 quarterly AUG-FY21 OCT-FY21
2 3 semiannual AUG-FY21 JAN-FY22