我希望有人可以给我一些指导或帮助。我有一个数据集,其中包含已经过三年感染测试的人群。一些人,而不是所有人,都在一年多的时间内被抽样(因此他们代表了重复措施)。我想确定感染的流行程度是否随着时间的推移而变化,但我在确定适当的测试时遇到了麻烦。简单的偶然性测试违反了独立性的假设,因为多年来一直重复的个体。我不认为Cochran-Mantel-Haenszel测试或McNemar卡方测试是合适的,但如果我错了,请随时纠正我。这是我正在使用的数据集," AnID"变量是代表单个人的因素(因此,如果个人在多年内被抽样,您会看到该数字重复2或3次)。
我认为一个可行的选择是多次随机重新抽样数据(无需更换),每次只包括一次,并进行多年的应急测试。如果无差异的零假设至少在95%的时间内被拒绝,那么我可以可靠地声称存在差异。我还不够用r来为此编写我自己的代码。提前感谢您提供的任何帮助。
dput(实施例) 结构(列表(AnID =结构(c)(37L,37L,45L,45L,45L,55L, 55L,62L,62L,68L,68L,1L,1L,2L,3L,3L,4L,9L,9L,18L, 18L,18L,19L,19L,19L,20L,20L,21L,22L,22L,23L,24L,24L, 24L,25L,25L,25L,26L,27L,28L,28L,28L,29L,29L,29L,30L, 31L,32L,32L,33L,34L,35L,36L,38L,38L,39L,39L,40L,41L, 41L,42L,42L,42L,43L,43L,43L,44L,46L,46L,46L,47L,47L, 47L,48L,48L,48L,49L,49L,49L,50L,51L,52L,52L,53L,53L, 54L,54L,56L,56L,57L,57L,57L,58L,59L,60L,61L,63L,64L, 65L,66L,67L,69L,70L,71L,72L,73L,74L,74L,5L,6L,7L, 8L,10L,11L,12L,13L,14L,15L,16L,17L),。标签= c(" 10", " 11"," 12"," 13"," 136"," 137"," 138& #34;," 139"," 14"," 140"," 141", " 142"," 143"," 144"," 145"," 146"," 147& #34;," 26"," 27"," 28"," 29", " 30"," 31"," 37"," 38"," 39"," 40& #34;," 41"," 42"," 43"," 44"," 45", " 46"," 47"," 48"," 49"," 5"," 50& #34;," 51"," 52"," 53"," 57"," 58", " 59"," 6"," 60"," 61"," 62"," 63& #34;," 64"," 65"," 66"," 67"," 69", " 7"," 70"," 71"," 72"," 75"," 76& #34;," 77"," 8"," 82"," 83"," 84", " 85"," 86"," 9"," 90"," 94"," 95& #34;," 96"," 97"," 98"),class =" factor"), 年=结构(c(1L,2L,1L,2L,3L,1L,2L,2L,3L,2L, 3L,2L,3L,2L,2L,3L,2L,2L,3L,1L,2L,3L,1L,2L,3L, 2L,3L,2L,1L,2L,2L,1L,2L,3L,1L,2L,3L,2L,2L,1L, 2L,3L,1L,2L,3L,2L,2L,2L,3L,2L,2L,2L,2L,2L,3L, 2L,3L,2L,2L,3L,1L,2L,3L,1L,2L,3L,2L,1L,2L,3L, 1L,2L,3L,1L,2L,3L,1L,2L,3L,2L,2L,1L,2L,1L,2L, 1L,2L,1L,2L,1L,2L,3L,2L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,2L,3L,3L,3L,3L,3L,3L,3L, 3L,3L,3L,3L,3L),。标签= c(" 2012"," 2013"," 2014"),class ="因子&#34), value = c(" Pos"," Pos"," Pos"," Pos"," Pos",& #34; Neg"," Neg", " Pos"," Pos"," Pos"," Pos"," Pos"," Pos& #34;," Neg"," Neg"," Pos", " Neg"," Pos"," Pos"," Neg"," Pos"," Pos& #34;," Neg"," Neg"," Neg", " Neg"," Neg"," Neg"," Pos"," Pos"," Pos& #34;," Pos"," Pos"," Pos", " Neg"," Pos"," Pos"," Neg"," Neg"," Neg& #34;," Neg"," Pos"," Pos", " Pos"," Pos"," Neg"," Neg"," Pos"," Pos& #34;," Neg"," Pos"," Neg", " Pos"," Neg"," Neg"," Neg"," Neg"," Neg& #34;," Neg"," Neg"," Pos", " Pos"," Pos"," Neg"," Pos"," Pos"," Neg& #34;," Neg"," Pos"," Neg", " Neg"," Neg"," Neg"," Neg"," Neg"," Neg& #34;," Neg"," Pos"," Pos", " Neg"," Neg"," Neg"," Pos"," Pos"," Pos& #34;," Pos"," Pos"," Neg", " Neg"," Neg"," Pos"," Pos"," Neg"," Neg& #34;," Neg"," Neg"," Neg", " Neg"," Pos"," Neg"," Neg"," Neg"," Neg& #34;," Neg"," Neg"," Neg", " Pos"," Pos"," Neg"," Neg"," Neg"," Pos& #34;," Pos"," Pos"," Neg", " Neg"," Pos"," Neg"," Pos"," Neg"))。。Name = c(" AnID","年", " value"),row.names = 187:306,class =" data.frame")
答案 0 :(得分:1)
请注意,实验/测试设计需要提前进行有效的样本量计算,以便最大限度地捕获具有统计显着性差异的可能性(如果存在)。 (有关详细信息,请参阅此处:https://en.wikipedia.org/wiki/Sample_size_determination和https://en.wikipedia.org/wiki/Statistical_power)。
如果您的所有用户都在受试者之前/之后(例如test / contol),您可以进行McNemar测试进行比例比较(参见此处:https://en.wikipedia.org/wiki/McNemar's_test)。
然而,并非所有用户都有重复测量,因此我选择为每个用户随机选择一年,因此我可以有3个独立的值样本。
假设dt
是您的数据集...
library(dplyr)
set.seed(1) # this will help you having a specific random sampling
dt %>%
mutate(Pos = ifelse(value == "Pos", 1, 0)) %>% # create a binary variable to flag positives
group_by(AnID) %>% # for each user
sample_n(1) %>% # get one row/value randomly
group_by(year) %>% # for each year
summarise(N = n(), # get number of users
N_Pos = sum(Pos), # get number of positive users
Prc_Pos = mean(Pos)) %>% # get percentage of positives
print() -> tbl1 # print and save it
# # A tibble: 3 × 4
# year N N_Pos Prc_Pos
# <fctr> <int> <dbl> <dbl>
# 1 2012 23 6 0.2608696
# 2 2013 27 9 0.3333333
# 3 2014 24 13 0.5416667
在观察每年的上述百分比后,您可以进行比例比较
# run the statistical comparison of proportions
prop.test(tbl1$N_Pos, tbl1$N)
# 3-sample test for equality of proportions without continuity correction
#
# data: tbl1$N_Pos out of tbl1$N
# X-squared = 4.3038, df = 2, p-value = 0.1163
# alternative hypothesis: two.sided
# sample estimates:
# prop 1 prop 2 prop 3
# 0.2608696 0.3333333 0.5416667
这里的P值(0.1163)表明,我们没有任何证据证明这些年份在积极的可能性方面存在差异。
如果您发现差异,可以在年份之间进行成对比较。
# run pairwise comparisons
pairwise.prop.test(tbl1$N_Pos, tbl1$N)
# Pairwise comparisons using Pairwise comparison of proportions
#
# data: tbl1$N_Pos out of tbl1$N
#
# 1 2
# 2 0.80 -
# 3 0.29 0.45
#
# P value adjustment method: holm
这里的输出是3 p值(3对比较)。正如所料,所有这些都表明这些年份之间没有任何差异。
您可以在函数中使用上述过程并创建N个模拟。 检查您可以在多少次模拟中找到具有统计意义的结果。