我有一个数据框:
I_Code Date_1 Date_2
2 14/09/2019 16/08/2019
2 14/09/2019 17/08/2019
2 14/09/2019 19/08/2019
2 14/09/2019 20/08/2019
2 14/09/2019 21/08/2019
2 14/09/2019 21/08/2019
2 14/09/2019 21/08/2019
2 14/09/2019 22/08/2019
2 14/09/2019 23/08/2019
2 14/09/2019 23/08/2019
2 14/09/2019 24/08/2019
2 14/09/2019 27/08/2019
2 14/09/2019 28/08/2019
2 14/09/2019 28/08/2019
2 14/09/2019 29/08/2019
2 14/09/2019 04/09/2019
2 14/09/2019 04/09/2019
2 14/09/2019 04/09/2019
2 14/09/2019 05/09/2019
2 14/09/2019 08/09/2019
2 14/09/2019 10/09/2019
2 14/09/2019 10/09/2019
2 14/09/2019 12/09/2019
2 03/09/2019 04/08/2019
2 03/09/2019 05/08/2019
2 03/09/2019 06/08/2019
2 03/09/2019 07/08/2019
2 03/09/2019 07/08/2019
2 03/09/2019 08/08/2019
2 03/09/2019 08/08/2019
2 03/09/2019 09/08/2019
2 03/09/2019 13/08/2019
2 03/09/2019 13/08/2019
我目前在数据框中有800个条目。我想将此数据集扩展为Date_2上带有约束的20k条目,这样Date_2上的条目数(按月总计计)应遵循对数增长趋势,即先升高后停滞。 (附图片)Comparator
请注意,该图仅是示例。
之前,我可以使用以下功能来获得图形:
def random_dates(start, end, starting_prob = 0.1, ending_prob = 1.0, date_format = '%d-%m-%y', num_samples = 20000):
start_date = datetime.datetime.strptime(start, date_format)
end_date = datetime.datetime.strptime(end, date_format)
# Get days between `start` and `end`
num_days = (end_date - start_date).days
linear_probabilities = expon.cdf(np.linspace(starting_prob, ending_prob, num_days), scale = 0.3)
# normalize probabilities so they add up to 1
linear_probabilities /= np.sum(linear_probabilities)
rand_days = np.random.choice(num_days, size = num_samples, replace = True,
p = linear_probabilities)
rand_date = [(start_date + datetime.timedelta(int(rand_days[ii]))).strftime(date_format)
for ii in range(num_samples)]
# return list of date strings
return rand_date
start_date = '02-08-19'
end_date = '29-09-19'
date_format = '%d-%m-%y'
sample_count = 20000
date_2 = random_dates(start_date, end_date, starting_prob = 0.1, ending_prob = 1.0, date_format=date_format, num_samples=sample_count)
但是现在其他变量(即date_1和I_Code)也已连接到date_2。他们没有这样的约束。
任何人都可以提供帮助。
谢谢