根据一些约束条件为日期生成随机数据

时间:2019-08-26 08:12:56

标签: python data-generation

我对此并不陌生,甚至可能会提出一个幼稚的问题。我想生成一个有一些约束的随机数据集:

date_1 - already generated in csv (Dated from 1 august 2018- 1 august 2019)
date_2 - 60% of the data lies within the 30 days from the date_1 and 40% of the data lies within 90 days of the date_2.  

capacity_1 - 3500 kgs is the threshold for a day. Cannot exceed the same for date_2 
capacity_2 - leftout weight for the day. its 3500-capacity_1 for a particular day.

我的date_1格式是d / m / y

有人可以建议我如何实现其他专栏吗?我打算用100,000行建立虚拟数据。

编辑:为数据here附加csv文件

EDIT2 :输入如下:

date_1   

01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018
01/08/2018

预期输出:

enter image description here

在特定日期_2处,此容量_2为3500-容量_1。 Capacity_2基本上可以给出在特定日期中已使用3500中的多少的想法。

谢谢

1 个答案:

答案 0 :(得分:0)

看看这个例子:

import pandas as pd
import datetime
import random

data = [
  [1,'a','b','01/08/2018'],
  [2,'a','b','01/08/2018'],
  [3,'a','b','01/08/2018'],
  [4,'a','b','01/08/2018'],
  [5,'a','b','01/08/2018'],
  [6,'a','b','01/08/2018'],
  [7,'a','b','01/08/2018'],
  [8,'a','b','01/08/2018'],
  [9,'a','b','01/08/2018'],
  [10,'a','b','01/08/2018']
  ]
# this should be crated by reading the csv file
df = pd.DataFrame(data) 

df = pd.to_datetime(df[0:][3], format='%d/%m/%Y')

generated_data = []
for index, date_1 in df.iteritems():
  # if in the first 60% generate +-30 days
  if index <= len(df) * 0.6 :
    days_to_add = random.randint(-30,30)
  else : 
    days_to_add = random.randint(-90,90)

  # add that day to the date_1
  date_2 = date_1 + datetime.timedelta(days=days_to_add)
  weight = random.randint(0,3500)
  capacity_1 = random.randint(0, 3500 - weight)
  capacity_2 = 3500 - capacity_1
  generated_data.append([date_1, date_2, weight, capacity_1, capacity_2])

df2 = pd.DataFrame(generated_data)
print(df2)

首先,我们从提取的CSV中仅删除第三列。然后,我们使用生成日期将日期添加到日期,然后date_1并创建date_2。以capacity_1为上限创建3500-weight,其余的工作很简单。 输出如下:

0 2018-08-01 2018-07-06  3136    11  3489
1 2018-08-01 2018-08-11  3476    13  3487
2 2018-08-01 2018-07-28  2620   207  3293
3 2018-08-01 2018-07-22  1976  1437  2063
4 2018-08-01 2018-07-06  3057    19  3481
5 2018-08-01 2018-08-19   773  1481  2019
6 2018-08-01 2018-08-06   823  2002  1498
7 2018-08-01 2018-07-01  1166  2200  1300
8 2018-08-01 2018-10-22   156  2567   933
9 2018-08-01 2018-06-18  1248  1842  1658