Question

vsample_data = credit_card.sample(n=520, replace='False')

print(vsample_data)

在这里，我试图从数据集中抽取520个数据点，但是无法获得正确的样本数据，因此从信用卡欺诈数据集中获得两个等级的概率相等，即0级（非欺诈）和1级（欺诈）。

Answer 1

d = {'actions': [1, 2, 1, 6, 4], 'fraud': [True, False, True, True, False]}
df = pd.DataFrame(data=d)
print (pd.concat([frauds.sample(n = 1, replace = 'False'), normal.sample(n = 1, replace = 'False')]))

Answer 2

创建欺诈数据框

我将使用10％的欺诈案件概率：

data = pd.DataFrame({'val':[random.randint(0,1000) for _ in range(1000)], 
                 'fraud':list(np.random.binomial(1, 0.1, 1000))})
data.head(10)

[OUT]

fraud   val
0   0   359
1   0   731
2   0   146
3   0   975
4   0   295
5   0   467
6   0   366
7   1   69
8   0   18
9   0   297

与非欺诈案件相比，欺诈案件应过滤9次。

data['weights'] = data.fraud * 9
data['weights'] += 1

加权样本

spl = data.sample(100,weights=data.weights)
sum(spl.fraud)

[OUT]

欺诈案件约占总样本的50％。

与熊猫一起采样

2 个答案:

创建欺诈数据框

加权样本