生成随机值,并根据熊猫的状况将它们映射到列

时间:2020-05-13 07:12:22

标签: python pandas numpy random faker

我正在尝试生成综合数据集。我设法生成了几列,但是我需要根据另一列的条件生成一列随机数。

def create_trans_dataset(num=1):
    output=[
            {"trans_date": np.random.choice(check),
             "trans_details":np.random.choice(["airtime_purchase",
                                               "customer_transfer",
                                               "deposit_funds",
                                               "withdrawal_amount"],
                                              p=[0.2, 0.2, 0.2, 0.1, 0.1, 0.2]),
             "trans_status": np.random.choice(["completed", "reversed",
                                               "procesing"],
                                               p=[0.9, 0.05, 0.05])
           }
            for x in range(num)
          ]
    return output

trans_dataset = pd.DataFrame(create_dataset(num=20))

def map_values(row, values_dict):
    return values_dict[row]

values_dict = {"airtime_purchase": random.randint(5, 5000),
               "customer_transfer": random.randint(100, 35000),
               "deposit_funds": random.randint(100, 35000),
               "withdrawal": random.randint(100, 35000)
            }

df['trans_details'] = df['trans_details'].apply(map_values, args = (values_dict,))

我当前的解决方案是为“ airtime_purchase”,“ customer_transfer”,“ deposit_funds”和“提款”生成一个常数。 我当前的输出是

trans_date  trans_details           trans_status    amount_transacted
0   2020-02-27  customer_transfer   completed        30165
1   2020-03-03  airtime_purchase    completed        14945
2   2020-01-02  withdrawal          completed        14595
3   2020-01-01  withdrawal          completed        26700
4   2020-02-18  airtime_purchase    completed        22860
5   2020-02-22  airtime_purchase    completed        17930
6   2020-01-01  airtime_purchase    completed        24370
7   2020-01-20  customer_transfer   completed        8735
8   2020-03-12  deposit_funds       completed        1065
9   2020-03-20  airtime_purchase    completed        27170

我想要的输出是对所有客户转账,airtime_purchases,deposit_funds和提款都有一个随机数,如下所示。

trans_date  trans_details           trans_status    amount_transacted
0   2020-02-27  customer_transfer   completed        3015
1   2020-03-03  airtime_purchase    completed        1495
2   2020-01-02  withdrawal          completed        1595
3   2020-01-01  withdrawal          completed        2600
4   2020-02-18  airtime_purchase    completed        2890
5   2020-02-22  airtime_purchase    completed        930
6   2020-01-01  airtime_purchase    completed        370
7   2020-01-20  customer_transfer   completed        9635
8   2020-03-12  deposit_funds       completed        5005
9   2020-03-20  airtime_purchase    completed        2817

1 个答案:

答案 0 :(得分:1)

我认为您可以做到:

def create_trans_dataset(num=1):
    output=[
            {"trans_date": np.random.randint(0,100),
             "trans_details":np.random.choice(["airtime_purchase",
                                               "customer_transfer",
                                               "deposit_funds",
                                               "withdrawal"],
                                              p=[0.2, 0.2, 0.2, 0.4]),
             "trans_status": np.random.choice(["completed", "reversed",
                                               "procesing"],
                                               p=[0.9, 0.05, 0.05])
           }
            for x in range(num)
          ]
    return output

trans_dataset = pd.DataFrame(create_trans_dataset(num=100))
trans_dataset['original_trans_details'] = trans_dataset['trans_details'].copy()

count = trans_dataset.trans_details.value_counts()
trans_dataset.loc[trans_dataset.trans_details!='airtime_purchase','trans_details'] = np.random.randint(100, 35000, count.sum()-count['airtime_purchase'])
trans_dataset.loc[trans_dataset.trans_details=='airtime_purchase','trans_details'] = np.random.randint(5, 5000, count['airtime_purchase'])

这会生成用于customer_transfer,deposit_funds,100-35000 ALL不同之间的取款的随机数,以及用于airtime_purchase 5-5000 ALL所有不同之间的随机数

enter image description here