我有一个数据框df1
F_Date B_Date
01/09/2019 02/08/2019
01/09/2019 03/08/2019
02/09/2019 03/08/2019
01/09/2019 04/08/2019
02/09/2019 04/08/2019
03/09/2019 04/08/2019
02/09/2019 05/08/2019
03/09/2019 05/08/2019
04/09/2019 05/08/2019
01/09/2019 06/08/2019
02/09/2019 06/08/2019
03/09/2019 06/08/2019
04/09/2019 06/08/2019
05/09/2019 06/08/2019
02/09/2019 07/08/2019
03/09/2019 07/08/2019
04/09/2019 07/08/2019
05/09/2019 07/08/2019
06/09/2019 07/08/2019
02/09/2019 08/08/2019
03/09/2019 08/08/2019
df有2列,因此对于每个F_Date,都有多个B_Date。
我还有另一个数据框df2
F_Date Value
01/09/2019 3000
02/09/2019 3700
03/09/2019 4500
04/09/2019 5000
05/09/2019 7000
06/09/2019 8000
07/09/2019 8300
08/09/2019 9000
09/09/2019 9500
10/09/2019 11000
11/09/2019 12500
12/09/2019 14000
13/09/2019 15000
14/09/2019 17000
15/09/2019 17600
16/09/2019 18000
17/09/2019 18500
18/09/2019 18900
19/09/2019 19000
20/09/2019 19400
21/09/2019 19800
22/09/2019 20500
23/09/2019 21000
24/09/2019 21600
25/09/2019 22000
26/09/2019 22100
27/09/2019 22200
28/09/2019 22500
29/09/2019 22800
30/09/2019 23000
我想在df1中创建一个新列value_1,例如:
对于df2中的每个F_date,都有一个关联的值。每个值都需要在df1中划分为F_Date的多个条目,以使分配的值不断增加。例如:
01/09/2019在df2中的值为3000,而01/09/2019在df1中的共有51条记录。然后需要将3000个数据划分为51个记录,以使每个值条目都大于前一个(创建增加的趋势)。我已经根据F_Date对df1进行了排序,并基于该条目进行了排序。
排序df1:
01/09/2019 02/08/2019
01/09/2019 02/08/2019
01/09/2019 03/08/2019
01/09/2019 03/08/2019
01/09/2019 04/08/2019
01/09/2019 06/08/2019
01/09/2019 09/08/2019
01/09/2019 10/08/2019
01/09/2019 10/08/2019
01/09/2019 11/08/2019
01/09/2019 12/08/2019
01/09/2019 12/08/2019
01/09/2019 13/08/2019
01/09/2019 13/08/2019
01/09/2019 13/08/2019
01/09/2019 14/08/2019
01/09/2019 14/08/2019
01/09/2019 14/08/2019
01/09/2019 15/08/2019
01/09/2019 16/08/2019
01/09/2019 17/08/2019
01/09/2019 17/08/2019
01/09/2019 18/08/2019
01/09/2019 18/08/2019
任何人都可以提供帮助。
答案 0 :(得分:2)
您可以使用exp然后进行归一化
import numpy as np
s = np.exp(np.linspace(0,1,51))
s = (s * 3000)/ np.sum(s)
np.sum(s)
总和是
2999.9999999999995
系列是
array([34.17786997, 34.86830875, 35.57269531, 36.29131142, 37.02444454,
37.77238794, 38.53544079, 39.31390833, 40.10810196, 40.91833937,
41.74494465, 42.58824847, 43.44858816, 44.32630787, 45.22175868,
46.13529881, 47.06729367, 48.01811607, 48.98814636, 49.97777256,
50.98739054, 52.01740415, 53.06822542, 54.14027469, 55.23398079,
56.34978121, 57.4881223 , 58.64945941, 59.83425708, 61.04298925,
62.27613944, 63.5342009 , 64.8176769 , 66.12708083, 67.46293648,
68.82577819, 70.21615114, 71.63461149, 73.08172663, 74.55807544,
76.06424847, 77.60084822, 79.16848934, 80.76779892, 82.39941668,
84.06399532, 85.76220067, 87.49471205, 89.26222248, 91.06543899,
92.90508288])
part-ii,在整个df上完成
joined_df = pd.merge(df_1,df_2,on='F_date')
def add_series(grp):
n_rows = grp.shape[0]
val = grp['Value'].min()
s = np.exp(np.linspace(0,1,n_rows))
s = (s * val)/ np.sum(s)
grp['col'] = s
return grp
joined_df.groupby('F_date').apply(add_series)
**未测试。但这应该可以给你一个主意