我正在尝试生成综合数据集。我设法生成了几列,但是我需要根据另一列的条件生成一列随机数。
def create_trans_dataset(num=1):
output=[
{"trans_date": np.random.choice(check),
"trans_details":np.random.choice(["airtime_purchase",
"customer_transfer",
"deposit_funds",
"withdrawal_amount"],
p=[0.2, 0.2, 0.2, 0.1, 0.1, 0.2]),
"trans_status": np.random.choice(["completed", "reversed",
"procesing"],
p=[0.9, 0.05, 0.05])
}
for x in range(num)
]
return output
trans_dataset = pd.DataFrame(create_dataset(num=20))
def map_values(row, values_dict):
return values_dict[row]
values_dict = {"airtime_purchase": random.randint(5, 5000),
"customer_transfer": random.randint(100, 35000),
"deposit_funds": random.randint(100, 35000),
"withdrawal": random.randint(100, 35000)
}
df['trans_details'] = df['trans_details'].apply(map_values, args = (values_dict,))
我当前的解决方案是为“ airtime_purchase”,“ customer_transfer”,“ deposit_funds”和“提款”生成一个常数。 我当前的输出是
trans_date trans_details trans_status amount_transacted
0 2020-02-27 customer_transfer completed 30165
1 2020-03-03 airtime_purchase completed 14945
2 2020-01-02 withdrawal completed 14595
3 2020-01-01 withdrawal completed 26700
4 2020-02-18 airtime_purchase completed 22860
5 2020-02-22 airtime_purchase completed 17930
6 2020-01-01 airtime_purchase completed 24370
7 2020-01-20 customer_transfer completed 8735
8 2020-03-12 deposit_funds completed 1065
9 2020-03-20 airtime_purchase completed 27170
我想要的输出是对所有客户转账,airtime_purchases,deposit_funds和提款都有一个随机数,如下所示。
trans_date trans_details trans_status amount_transacted
0 2020-02-27 customer_transfer completed 3015
1 2020-03-03 airtime_purchase completed 1495
2 2020-01-02 withdrawal completed 1595
3 2020-01-01 withdrawal completed 2600
4 2020-02-18 airtime_purchase completed 2890
5 2020-02-22 airtime_purchase completed 930
6 2020-01-01 airtime_purchase completed 370
7 2020-01-20 customer_transfer completed 9635
8 2020-03-12 deposit_funds completed 5005
9 2020-03-20 airtime_purchase completed 2817
答案 0 :(得分:1)
我认为您可以做到:
def create_trans_dataset(num=1):
output=[
{"trans_date": np.random.randint(0,100),
"trans_details":np.random.choice(["airtime_purchase",
"customer_transfer",
"deposit_funds",
"withdrawal"],
p=[0.2, 0.2, 0.2, 0.4]),
"trans_status": np.random.choice(["completed", "reversed",
"procesing"],
p=[0.9, 0.05, 0.05])
}
for x in range(num)
]
return output
trans_dataset = pd.DataFrame(create_trans_dataset(num=100))
trans_dataset['original_trans_details'] = trans_dataset['trans_details'].copy()
count = trans_dataset.trans_details.value_counts()
trans_dataset.loc[trans_dataset.trans_details!='airtime_purchase','trans_details'] = np.random.randint(100, 35000, count.sum()-count['airtime_purchase'])
trans_dataset.loc[trans_dataset.trans_details=='airtime_purchase','trans_details'] = np.random.randint(5, 5000, count['airtime_purchase'])
这会生成用于customer_transfer,deposit_funds,100-35000 ALL不同之间的取款的随机数,以及用于airtime_purchase 5-5000 ALL所有不同之间的随机数