自动化(手动)R代码以计算P值

时间:2019-05-18 11:02:04

标签: r

我正在尝试自动化下面的R代码,其中我正在计算p值。数据为csv格式(在excel中)。我有每个部分及其版本的点击次数和公开号码。如果有人可以帮助您应用任何循环或其他内容。

我具有.csv格式的数据:

Section Version A   Version B   Version C   Version D
Section 1   2967    3353             495    559
Section 2   4840    4522             285    266
Section 3   
Section 4   
Section 5   
Main emailbody                  
Total email                 



Version # Opens
A    18223
B    
C    
D    

方法1-(从csv文件中手动分配数据):

S1_Click_A=2967 #(section 1, email A)
S1_Click_B=3353 #(section 1, email B)
S1_Click_C=495
S1_Click_D=559
S2_Click_A=4840
...
S5_Click_D=154
MainBody_Click_A=12408
...
MainBody_Click_D=260
TotalEmail_Click_A=13525
..
TotalEmail_Click_D=248`

#no. email opens
Open_A=18223
Open_B=18368
Open_C=18223
Open_D=18368


#to test % total click is the comparable across versions
#section 1 test 
S1ab <- prop.test(x = c(S1_Click_A,S1_Click_B), n = c(Open_A,Open_B))
...
S1cd <- prop.test(x = c(S1_Click_C,S1_Click_D), n = c(Open_C,Open_D))

#section 2 test
S2ab <- prop.test(x = c(S2_Click_A,S2_Click_B), n = c(Open_A,Open_B))
...
S2cd <- prop.test(x = c(S2_Click_C,S2_Click_D), n = c(Open_C,Open_D))

#similarly for section 3,4 and 5

#Main body test
MainBodyab <- prop.test(x = c(MainBody_Click_A,MainBody_Click_B), n = 
c(Open_A,Open_B))
MainBodyac <- prop.test(x = c(MainBody_Click_A,MainBody_Click_C), n = 
c(Open_A,Open_C))
...
MainBodycd <- prop.test(x = c(MainBody_Click_C,MainBody_Click_D), n = 
c(Open_C,Open_D))

#Total Email test
 TotalEmailab <- prop.test(x = c(TotalEmail_Click_A,TotalEmail_Click_B), n 
 =c(Open_A,Open_B))
 ```
 TotalEmailcd <- prop.test(x = c(TotalEmail_Click_C,TotalEmail_Click_D), n 
 = c(Open_C,Open_D))

#FINAL P VALUE
S1ab$p.value
S1ac$p.value

方法2

# no. email opens
open <- 
c(
Open_A=18223,
Open_B=18368,
Open_C=18223,
Open_D=18368
)

s1 <- c(
S1_Click_A=2967, #(section 1, email A)
S1_Click_B=3353, #(section 1, email B)
S1_Click_C=495,
S1_Click_D=559
)

open_comb <- combn(names(open), 2)
s1_comb <- combn(names(s1), 2)
res_names <-  combn(c("A", "B", "C", "D"), 2)

# to test % total click is the comparable across versions`
# section 1 test`
result1 <- list()
for(k in 1:length(open)){
result1[[paste0("s1", res_names[1, k], res_names[2, k])]] <- prop.test(x = 
s1[s1_comb[,k]], n = open[open_comb[,k]])
}
result_section1 <- c (ress1$s1AB$p.value, ress1$s1AC$p.value, 
ress1$s1AD$p.value, ress1$s1BC$p.value, ress1$s1BD$p.value, 
ress1$s1CD$p.value)
result_section1

但是,此自动代码仅提供以下组合的P值:AB,AC,AD,BC,而BD和CD不提供。 可能是因为开放的长度(即只有4个)(请帮助解决)

I expect:
1. I want to read the input data directly from the csv. I mean reading the 
   section 1 version A data i.e 2967 then assign the same to 
   S1_Click_A=2967 variable and similarly for others.
2. Fix the code to provides P values only for all combination: AB, AC, AD, BC,BD and CD.

dput(数据)

structure(list(Section = structure(c(2L, 3L, 4L, 5L, 6L, 1L, 7L), .Label = 
c("Main email body", "Section 1", "Section 2", "Section 3", "Section 4", 
"Section 5", "Total email"), class = "factor"), Version.A = c(2967L, 4840L, 
2508L, 2093L, 1117L, 12408L, 13525L), Version.B = c(3353L, 4522L, 2250L, 
1333L, 925L, 11458L, 12383L), Version.C = c(495L, 285L, 228L, 209L, 186L, 
282L, 271L), Version.D = c(559L, 266L, 205L, 133L, 154L, 260L, 248L)), class 
= "data.frame", row.names = c(NA, -7L ))

1 个答案:

答案 0 :(得分:0)

考虑将数据从其原始的宽格式重整为长格式。然后按每个部分并跨 Version 的所有组合运行prop.test。下面构建了一个包含每7个部分的所有6种组合的prop.test个结果(包括但不限于p值)的元素列表。

数据

txt <- '"Section" "Version A"   "Version B"   "Version C"   "Version D"
"Section 1"   2967    3353             495    559
"Section 2"   4840    4522             285    266
"Section 3"   2508    2250             228    205
"Section 4"   2093    1333             209    133
"Section 5"   1117    925              186    154
"Main emailbody"  12408   11458        282    260
"Total email" 13525   12383            271    248'

df <- read.table(text = txt, header = TRUE)

open_df <- data.frame(Version = c("A", "B", "C", "D"),
                      Open = c(18223, 18368, 18223, 18368))

reshape + by

# RESHAPE WIDE TO LONG
rdf <- reshape(df, idvar = "Section", varying = list(names(df)[-1]),
               times = names(df)[-1], v.names = "Value", timevar = "Version",
               new.row.names = 1:1E5, direction = "long")

rdf$Version  <- gsub("Version.", "", rdf$Version)

# SUBSET BY SECTION AND RUN prop.test ON ALL COMBS
prop_test_list <- by(rdf, rdf$Section, function(sub) {
    pairs <- combn(sub$Version, 2, simplify = FALSE)

    sapply(pairs, function(item) 
             prop.test(x = sub$Value[sub$Version %in% item], 
                       n = open_df$Open[open_df$Version %in% item])
          )
})

Rextester demo