根据另一个数据框和很少的约束在数据框中生成一个新列

时间:2019-10-30 10:34:54

标签: python pandas numpy

我有一个数据框df1

F_Date      B_Date
01/09/2019  02/08/2019
01/09/2019  03/08/2019
02/09/2019  03/08/2019
01/09/2019  04/08/2019
02/09/2019  04/08/2019
03/09/2019  04/08/2019
02/09/2019  05/08/2019
03/09/2019  05/08/2019
04/09/2019  05/08/2019
01/09/2019  06/08/2019
02/09/2019  06/08/2019
03/09/2019  06/08/2019
04/09/2019  06/08/2019
05/09/2019  06/08/2019
02/09/2019  07/08/2019
03/09/2019  07/08/2019
04/09/2019  07/08/2019
05/09/2019  07/08/2019
06/09/2019  07/08/2019
02/09/2019  08/08/2019
03/09/2019  08/08/2019

df有2列,因此对于每个F_Date,都有多个B_Date。

我还有另一个数据框df2

F_Date  Value
01/09/2019  3000
02/09/2019  3700
03/09/2019  4500
04/09/2019  5000
05/09/2019  7000
06/09/2019  8000
07/09/2019  8300
08/09/2019  9000
09/09/2019  9500
10/09/2019  11000
11/09/2019  12500
12/09/2019  14000
13/09/2019  15000
14/09/2019  17000
15/09/2019  17600
16/09/2019  18000
17/09/2019  18500
18/09/2019  18900
19/09/2019  19000
20/09/2019  19400
21/09/2019  19800
22/09/2019  20500
23/09/2019  21000
24/09/2019  21600
25/09/2019  22000
26/09/2019  22100
27/09/2019  22200
28/09/2019  22500
29/09/2019  22800
30/09/2019  23000

我想在df1中创建一个新列value_1,例如:

对于df2中的每个F_date,都有一个关联的值。每个值都需要在df1中划分为F_Date的多个条目,以使分配的值不断增加。例如:

01/09/2019在df2中的值为3000,而01/09/2019在df1中的共有51条记录。然后需要将3000个数据划分为51个记录,以使每个值条目都大于前一个(创建增加的趋势)。我已经根据F_Date对df1进行了排序,并基于该条目进行了排序。

排序df1:

01/09/2019  02/08/2019
01/09/2019  02/08/2019
01/09/2019  03/08/2019
01/09/2019  03/08/2019
01/09/2019  04/08/2019
01/09/2019  06/08/2019
01/09/2019  09/08/2019
01/09/2019  10/08/2019
01/09/2019  10/08/2019
01/09/2019  11/08/2019
01/09/2019  12/08/2019
01/09/2019  12/08/2019
01/09/2019  13/08/2019
01/09/2019  13/08/2019
01/09/2019  13/08/2019
01/09/2019  14/08/2019
01/09/2019  14/08/2019
01/09/2019  14/08/2019
01/09/2019  15/08/2019
01/09/2019  16/08/2019
01/09/2019  17/08/2019
01/09/2019  17/08/2019
01/09/2019  18/08/2019
01/09/2019  18/08/2019

任何人都可以提供帮助。

1 个答案:

答案 0 :(得分:2)

您可以使用exp然后进行归一化

import numpy as np
s = np.exp(np.linspace(0,1,51)) 
s = (s * 3000)/ np.sum(s)
np.sum(s)

总和是

2999.9999999999995

系列是

array([34.17786997, 34.86830875, 35.57269531, 36.29131142, 37.02444454,
       37.77238794, 38.53544079, 39.31390833, 40.10810196, 40.91833937,
       41.74494465, 42.58824847, 43.44858816, 44.32630787, 45.22175868,
       46.13529881, 47.06729367, 48.01811607, 48.98814636, 49.97777256,
       50.98739054, 52.01740415, 53.06822542, 54.14027469, 55.23398079,
       56.34978121, 57.4881223 , 58.64945941, 59.83425708, 61.04298925,
       62.27613944, 63.5342009 , 64.8176769 , 66.12708083, 67.46293648,
       68.82577819, 70.21615114, 71.63461149, 73.08172663, 74.55807544,
       76.06424847, 77.60084822, 79.16848934, 80.76779892, 82.39941668,
       84.06399532, 85.76220067, 87.49471205, 89.26222248, 91.06543899,
       92.90508288])

part-ii,在整个df上完成

joined_df = pd.merge(df_1,df_2,on='F_date')

def add_series(grp):
   n_rows = grp.shape[0]
   val = grp['Value'].min()
   s = np.exp(np.linspace(0,1,n_rows)) 
   s = (s * val)/ np.sum(s)
   grp['col'] = s
   return grp

joined_df.groupby('F_date').apply(add_series)

**未测试。但这应该可以给你一个主意