我有一个数据框:
df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
'month': ['1','1','3','3','5'],
'pmonth': ['1', '1', '2', '5', '5'],
'duration': [30, 15, 20, 15, 30],
'pduration': ['10', '20', '30', '40', '50']})
我必须将duration
和pduration
除以第二个数据帧的值列,其中两个df
的日期和月份匹配。第二个df
是:
df = pd.DataFrame({'date': ['2013-04-01','2013-04-02','2013-04-03','2013-04-04', '2013-04-05'],
'month': ['1','1','3','3','5'],
'value': ['1', '1', '2', '5', '5'],
})
第二个df
按日期和月份分组,因此第二个df
中不会出现日期月份的重复组合。
答案 0 :(得分:1)
首先有必要检查dtypes
中的列date
和month
的{{1}}是否相同,以及要划分的列是否为数字:
DataFrames
然后用左连接DataFrame.div
除#convert to numeric
df1['pduration'] = df1['pduration'].astype(int)
df2['value'] = df2['value'].astype(int)
print (df1.dtypes)
date object
month object
pmonth object
duration int64
pduration int32
print (df2.dtypes)
date object
month object
value int32
dtype: object
merge
要删除df = df1.merge(df2, on=['date', 'month'], how='left')
df[['duration_new','pduration_new']] = df[['duration','pduration']].div(df['value'], axis=0)
print (df)
date month pmonth duration pduration value duration_new \
0 2013-04-01 1 1 30 10 1.0 30.0
1 2013-04-01 1 1 15 20 1.0 15.0
2 2013-04-01 3 2 20 30 NaN NaN
3 2013-04-02 3 5 15 40 NaN NaN
4 2013-04-02 5 5 30 50 NaN NaN
pduration_new
0 10.0
1 20.0
2 NaN
3 NaN
4 NaN
列,请使用pop
:
value
答案 1 :(得分:0)
您可以将第二个df合并到第一个df中,然后进行划分。
将第一个df视为df1
,将第二个df视为df2
df1 = df1.merge(df2, on=['date', 'month'], how='left').fillna(1)
df1
date month pmonth duration pduration value
0 2013-04-01 1 1 30 10 1
1 2013-04-01 1 1 15 20 1
2 2013-04-01 3 2 20 30 1
3 2013-04-02 3 5 15 40 1
4 2013-04-02 5 5 30 50 1
df1['duration'] = df1['duration'] / df1['value']
df1['pduration'] = df1['pduration'] / df1['value']
df1.drop('value', axis=1, inplace=True)
答案 2 :(得分:0)
您可以合并两个数据框,其中日期和月份与value列匹配的位置将添加到第一个数据框。如果没有匹配项,它将以NaN表示。然后可以进行除法运算。参见下面的代码
假设第二个数据帧是df2,则
cannot find package github.com/robfig/cron (using -importcfg)
cannot find package github.com/go-chi/chi/middleware (using -importcfg)
结果
df3 = df2.merge(df, how = 'right')
for col in ['duration','pduration']:
df3['new_'+col] = df3[col].astype(float)/df3['value'].astype(float)
df3