I have the below data frames
----
df=
city code qty month year
hyd 1 10 1 2016
hyd 2 12 2 2016
hyd 3 15 3 2016
hyd 1 25 1 2017
hyd 2 15 2 2017
hyd 3 25 4 2017
hyd 1 25 1 2018
hyd 2 15 3 2018
hyd 3 25 6 2018
b =
city code qty month year
hyd 1 10 1 2016
hyd 2 12 2 2016
hyd 3 18 3 2016
hyd 4 22 4 2016
hyd 5 10 5 2016
hyd 6 12 6 2016
hyd 1 12 1 2017
hyd 2 12 2 2017
hyd 3 16 3 2017
hyd 4 25 4 2017
hyd 5 10 5 2017
hyd 6 14 6 2017
hyd 1 10 1 2018
hyd 2 12 2 2018
hyd 3 18 3 2018
hyd 4 25 4 2018
hyd 5 10 5 2018
hyd 6 12 6 2018
我想将df与b进行比较,并单行获取其前几个月的月份数量。年份只能与更少的年份进行比较 比那年。下面是结果数据框。
resultdf=
city code qty month year qty_2016 qty_2017 qty_2018
hyd 1 10 1 2016
hyd 2 12 2 2016
hyd 3 15 3 2016
hyd 1 25 1 2017 10
hyd 2 15 2 2017 12
hyd 3 25 4 2017 22
hyd 1 25 1 2018 10 15
hyd 2 15 3 2018 18 16
hyd 3 25 6 2018 12 14
下面是代码:
attribute_name = 'city'
attribute_code = 'code'
df1 = df[df['year'].isin(['2018'])]
month_list = df1.month.unique()
feature_list = df1[attribute_name].unique()
code_list = df1[attribute_code].unique()
for feature_name in feature_list:
for code_num in code_list:
for month_num in month_list:
dff2 = b[(b['month'].isin([str(month_num)])) & (b['year'].isin(['2018'])) & (b['code'].isin([str(code_num)])) & (b['city'].isin([str(feature_name)]))]
dff2 = dff2.drop(['year'], axis=1)
dff2 = dff2.rename(columns={'qty': 'qty_2018'})
dff = b[(b['month'].isin([str(month_num)])) & (b['year'].isin(['2017'])) & (b['code'].isin([str(code_num)])) & (b['city'].isin([str(feature_name)]))]
dff = dff.drop(['year'], axis=1)
dff = dff.rename(columns={'qty': 'qty_2017'})
dff1 = b[(b['month'].isin([str(month_num)])) & (b['year'].isin(['2016'])) & (b['code'].isin([str(code_num)])) & (b['city'].isin([str(feature_name)]))]
dff1 = dff1.drop(['year'], axis=1)
dff1 = dff1.rename(columns={'qty': 'qty_2016'})
df2 = dff2.merge(dff, on=['city','code','month'], how='left')
df3 = df2.merge(dff1,on=['city','code','month'], how='left' )
result.append(df3)
frame = pd.concat(result)
frame['year'] = 2018
以相同的方式,我将重复2017年,我将获得qty_2017和qty_2016作为frame1并联系frame和frame1。
上面的代码为我提供了所需的结果,但是这非常耗时,而且并非所有年份都处于循环状态。我需要以什么方式才能使它变得更好和更快而获得帮助。