我有以下数据框:
df = pd.DataFrame({'date': ['31/12/2015','31/12/2016','31/12/2017','31/12/2018',
'31/12/2019','31/12/2020','31/12/2015','31/12/2016',
'31/12/2017','31/12/2018','31/12/2019','31/12/2020'],
'season': ['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'],
'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0] })
df.date= pd.to_datetime(df.date)
print(df)
date season total
0 2015-12-31 S1 1.000000
1 2016-12-31 S1 0.000000
2 2017-12-31 S1 0.000000
3 2018-12-31 S1 0.000000
4 2019-12-31 S1 0.022313
5 2020-12-31 S1 0.053791
6 2015-12-31 S2 0.000000
7 2016-12-31 S2 0.000000
8 2017-12-31 S2 0.307783
9 2018-12-31 S2 0.000000
10 2019-12-31 S2 0.000000
11 2020-12-31 S2 0.000000
我想根据列'总计'中包含的值对每行进行多次计算。以下列格式获取数据帧(第一行的示例):
date season total calculation id result
0 2015-12-31 S1 1.000000 1 x1
0 2015-12-31 S1 1.000000 2 x2
0 2015-12-31 S1 1.000000 3 x3
0 2015-12-31 S1 1.000000 4 x4
0 2015-12-31 S1 1.000000 5 x5
基本上类似于:
for index, row in df.iterrows():
for i, a in enumerate(np.linspace(0,getattr(row,'total'),6)):
assing the result of the calculation to the column result
关于我如何做到这一点的任何想法?为了示例,可以在循环中将结果列计算为a*5
。
感谢您的帮助,
皮尔
答案 0 :(得分:0)
完成这项工作的一种方法,"复制"该行首先为df中的每一行创建一个列list_results:
df['list_result'] = df['total'].apply(lambda a: np.linspace(0,a,6)*5)
在此列中,您可以使用stack
为列表中的每个值创建一个包含行的系列,并通过首先设置索引,您可以直接在系列上工作:
df_output = (df.set_index(['date', 'season','total'])['list_result']
# set index and work on the column list_result
.apply(pd.Series).stack() #will expand the lists of results as rows
.reset_index()) # to get back the column 'date', 'season','total'
#you can rename the column
df_output.columns = ['date', 'season','total', 'calculation_id', 'result']
df_output的第一行是:
date season total calculation_id result
0 2015-12-31 S1 1.000000 0 0.000000
1 2015-12-31 S1 1.000000 1 1.000000
2 2015-12-31 S1 1.000000 2 2.000000
3 2015-12-31 S1 1.000000 3 3.000000
4 2015-12-31 S1 1.000000 4 4.000000
5 2015-12-31 S1 1.000000 5 5.000000
请注意,它并不是您期望的结果,但是通过使用np.linspace(0,getattr(row,'total'),6)
它将获得的内容,您可以在创建list_result时更改此功能。
答案 1 :(得分:0)
您可以尝试:
import pandas as pd
df = pd.DataFrame({'date' : ['31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020', '31/12/2015','31/12/2016','31/12/2017','31/12/2018','31/12/2019','31/12/2020'], 'season':['S1','S1','S1','S1','S1','S1','S2','S2','S2','S2','S2','S2'], 'total' : [1,0,0,0,0.022313421,0.053791041,0,0,0.307783314,0,0,0] })
df.date= pd.to_datetime(df.date)
df['key'] = 1 #add key for merge
ids = pd.DataFrame({'calculation_id': [1, 2, 3, 4, 5], 'key': 1})
df = pd.merge(df, ids, on = 'key').drop('key', 1) #cartesian product
df['result'] = df['total']*df['calculation_id']
print(df)
我们的想法是创建另一个包含计算ID的数据框。然后"交叉加入"与您的原始数据帧。最后,将总计乘以计算ID以找到结果。