我对此并不陌生,甚至可能会提出一个幼稚的问题。我想生成一个有一些约束的随机数据集:
date_1 - already generated in csv (Dated from 1 august 2018- 1 august 2019)
date_2 - 60% of the data lies within the 30 days from the date_1 and 40% of the data lies within 90 days of the date_2.
capacity_1 - 3500 kgs is the threshold for a day. Cannot exceed the same for date_2
capacity_2 - leftout weight for the day. its 3500-capacity_1 for a particular day.
我的date_1格式是d / m / y
有人可以建议我如何实现其他专栏吗?我打算用100,000行建立虚拟数据。
编辑:为数据here附加csv文件
EDIT2 :输入如下:
date_1
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
预期输出:
在特定日期_2处,此容量_2为3500-容量_1。 Capacity_2基本上可以给出在特定日期中已使用3500中的多少的想法。
谢谢
答案 0 :(得分:0)
看看这个例子:
import pandas as pd
import datetime
import random
data = [
[1,'a','b','01/08/2018'],
[2,'a','b','01/08/2018'],
[3,'a','b','01/08/2018'],
[4,'a','b','01/08/2018'],
[5,'a','b','01/08/2018'],
[6,'a','b','01/08/2018'],
[7,'a','b','01/08/2018'],
[8,'a','b','01/08/2018'],
[9,'a','b','01/08/2018'],
[10,'a','b','01/08/2018']
]
# this should be crated by reading the csv file
df = pd.DataFrame(data)
df = pd.to_datetime(df[0:][3], format='%d/%m/%Y')
generated_data = []
for index, date_1 in df.iteritems():
# if in the first 60% generate +-30 days
if index <= len(df) * 0.6 :
days_to_add = random.randint(-30,30)
else :
days_to_add = random.randint(-90,90)
# add that day to the date_1
date_2 = date_1 + datetime.timedelta(days=days_to_add)
weight = random.randint(0,3500)
capacity_1 = random.randint(0, 3500 - weight)
capacity_2 = 3500 - capacity_1
generated_data.append([date_1, date_2, weight, capacity_1, capacity_2])
df2 = pd.DataFrame(generated_data)
print(df2)
首先,我们从提取的CSV中仅删除第三列。然后,我们使用生成日期将日期添加到日期,然后date_1
并创建date_2。以capacity_1
为上限创建3500-weight
,其余的工作很简单。
输出如下:
0 2018-08-01 2018-07-06 3136 11 3489
1 2018-08-01 2018-08-11 3476 13 3487
2 2018-08-01 2018-07-28 2620 207 3293
3 2018-08-01 2018-07-22 1976 1437 2063
4 2018-08-01 2018-07-06 3057 19 3481
5 2018-08-01 2018-08-19 773 1481 2019
6 2018-08-01 2018-08-06 823 2002 1498
7 2018-08-01 2018-07-01 1166 2200 1300
8 2018-08-01 2018-10-22 156 2567 933
9 2018-08-01 2018-06-18 1248 1842 1658