我正在尝试自动化下面的R代码,其中我正在计算p值。数据为csv格式(在excel中)。我有每个部分及其版本的点击次数和公开号码。如果有人可以帮助您应用任何循环或其他内容。
我具有.csv格式的数据:
Section Version A Version B Version C Version D
Section 1 2967 3353 495 559
Section 2 4840 4522 285 266
Section 3
Section 4
Section 5
Main emailbody
Total email
Version # Opens
A 18223
B
C
D
S1_Click_A=2967 #(section 1, email A)
S1_Click_B=3353 #(section 1, email B)
S1_Click_C=495
S1_Click_D=559
S2_Click_A=4840
...
S5_Click_D=154
MainBody_Click_A=12408
...
MainBody_Click_D=260
TotalEmail_Click_A=13525
..
TotalEmail_Click_D=248`
#no. email opens
Open_A=18223
Open_B=18368
Open_C=18223
Open_D=18368
#to test % total click is the comparable across versions
#section 1 test
S1ab <- prop.test(x = c(S1_Click_A,S1_Click_B), n = c(Open_A,Open_B))
...
S1cd <- prop.test(x = c(S1_Click_C,S1_Click_D), n = c(Open_C,Open_D))
#section 2 test
S2ab <- prop.test(x = c(S2_Click_A,S2_Click_B), n = c(Open_A,Open_B))
...
S2cd <- prop.test(x = c(S2_Click_C,S2_Click_D), n = c(Open_C,Open_D))
#similarly for section 3,4 and 5
#Main body test
MainBodyab <- prop.test(x = c(MainBody_Click_A,MainBody_Click_B), n =
c(Open_A,Open_B))
MainBodyac <- prop.test(x = c(MainBody_Click_A,MainBody_Click_C), n =
c(Open_A,Open_C))
...
MainBodycd <- prop.test(x = c(MainBody_Click_C,MainBody_Click_D), n =
c(Open_C,Open_D))
#Total Email test
TotalEmailab <- prop.test(x = c(TotalEmail_Click_A,TotalEmail_Click_B), n
=c(Open_A,Open_B))
```
TotalEmailcd <- prop.test(x = c(TotalEmail_Click_C,TotalEmail_Click_D), n
= c(Open_C,Open_D))
#FINAL P VALUE
S1ab$p.value
S1ac$p.value
# no. email opens
open <-
c(
Open_A=18223,
Open_B=18368,
Open_C=18223,
Open_D=18368
)
s1 <- c(
S1_Click_A=2967, #(section 1, email A)
S1_Click_B=3353, #(section 1, email B)
S1_Click_C=495,
S1_Click_D=559
)
open_comb <- combn(names(open), 2)
s1_comb <- combn(names(s1), 2)
res_names <- combn(c("A", "B", "C", "D"), 2)
# to test % total click is the comparable across versions`
# section 1 test`
result1 <- list()
for(k in 1:length(open)){
result1[[paste0("s1", res_names[1, k], res_names[2, k])]] <- prop.test(x =
s1[s1_comb[,k]], n = open[open_comb[,k]])
}
result_section1 <- c (ress1$s1AB$p.value, ress1$s1AC$p.value,
ress1$s1AD$p.value, ress1$s1BC$p.value, ress1$s1BD$p.value,
ress1$s1CD$p.value)
result_section1
但是,此自动代码仅提供以下组合的P值:AB,AC,AD,BC,而BD和CD不提供。 可能是因为开放的长度(即只有4个)(请帮助解决)
I expect:
1. I want to read the input data directly from the csv. I mean reading the
section 1 version A data i.e 2967 then assign the same to
S1_Click_A=2967 variable and similarly for others.
2. Fix the code to provides P values only for all combination: AB, AC, AD, BC,BD and CD.
dput(数据)
structure(list(Section = structure(c(2L, 3L, 4L, 5L, 6L, 1L, 7L), .Label =
c("Main email body", "Section 1", "Section 2", "Section 3", "Section 4",
"Section 5", "Total email"), class = "factor"), Version.A = c(2967L, 4840L,
2508L, 2093L, 1117L, 12408L, 13525L), Version.B = c(3353L, 4522L, 2250L,
1333L, 925L, 11458L, 12383L), Version.C = c(495L, 285L, 228L, 209L, 186L,
282L, 271L), Version.D = c(559L, 266L, 205L, 133L, 154L, 260L, 248L)), class
= "data.frame", row.names = c(NA, -7L ))
答案 0 :(得分:0)
考虑将数据从其原始的宽格式重整为长格式。然后按每个部分并跨 Version 的所有组合运行prop.test
。下面构建了一个包含每7个部分的所有6种组合的prop.test
个结果(包括但不限于p值)的元素列表。
数据
txt <- '"Section" "Version A" "Version B" "Version C" "Version D"
"Section 1" 2967 3353 495 559
"Section 2" 4840 4522 285 266
"Section 3" 2508 2250 228 205
"Section 4" 2093 1333 209 133
"Section 5" 1117 925 186 154
"Main emailbody" 12408 11458 282 260
"Total email" 13525 12383 271 248'
df <- read.table(text = txt, header = TRUE)
open_df <- data.frame(Version = c("A", "B", "C", "D"),
Open = c(18223, 18368, 18223, 18368))
reshape
+ by
# RESHAPE WIDE TO LONG
rdf <- reshape(df, idvar = "Section", varying = list(names(df)[-1]),
times = names(df)[-1], v.names = "Value", timevar = "Version",
new.row.names = 1:1E5, direction = "long")
rdf$Version <- gsub("Version.", "", rdf$Version)
# SUBSET BY SECTION AND RUN prop.test ON ALL COMBS
prop_test_list <- by(rdf, rdf$Section, function(sub) {
pairs <- combn(sub$Version, 2, simplify = FALSE)
sapply(pairs, function(item)
prop.test(x = sub$Value[sub$Version %in% item],
n = open_df$Open[open_df$Version %in% item])
)
})