Question

我有两个版本的控制版本A和测试版本B。这些向量包含访问者的收入。因此，A版本有3020个未购买的访客，而B版本则有2811个。收入数据来自不同的来源：

A <- c(rep(0, 3020), revenue_A[, 2])
B <- c(rep(0, 2811), revenue_B[, 2])

这些不是正常分布的，但是右尾巴很粗。 length(rev_A[, 2])和length(rev_B[, 2])大约为700，并且包含20到100之间的值。

我的方法是使用这些值的10％将这些向量引导1000次，计算平均收益值，然后执行t.test：

aSS <- round(0.1 * length(A))
bSS <- round(0.1 * length(B))
bootA <- c()
bootB <- c()
for (i in 1:1000) {
  tempA <- sample(A, aSS, replace = TRUE) # 10% samples of the original data
  tempB <- sample(B, bSS, replace = TRUE)
  bootA <- c(bootA, mean(tempA)) # Calculate mean of the sample
  bootB <- c(bootB, mean(tempB))
}
hist(bootA)
hist(bootB)
# --> Seem to have normal distribution, let's do t.test
t.test(bootA, bootB)

这是正确的方法吗？我很难找到基于这种统计计算的教程。

自举A / B测试结果（每个访问者向量的收入）

0 个答案: