Question

我希望有人可以给我一些指导或帮助。我有一个数据集，其中包含已经过三年感染测试的人群。一些人，而不是所有人，都在一年多的时间内被抽样（因此他们代表了重复措施）。我想确定感染的流行程度是否随着时间的推移而变化，但我在确定适当的测试时遇到了麻烦。简单的偶然性测试违反了独立性的假设，因为多年来一直重复的个体。我不认为Cochran-Mantel-Haenszel测试或McNemar卡方测试是合适的，但如果我错了，请随时纠正我。这是我正在使用的数据集，＆＃34; AnID＆＃34;变量是代表单个人的因素（因此，如果个人在多年内被抽样，您会看到该数字重复2或3次）。

我认为一个可行的选择是多次随机重新抽样数据（无需更换），每次只包括一次，并进行多年的应急测试。如果无差异的零假设至少在95％的时间内被拒绝，那么我可以可靠地声称存在差异。我还不够用r来为此编写我自己的代码。提前感谢您提供的任何帮助。

dput（实施例）结构（列表（AnID =结构（c）（37L，37L，45L，45L，45L，55L， 55L，62L，62L，68L，68L，1L，1L，2L，3L，3L，4L，9L，9L，18L， 18L，18L，19L，19L，19L，20L，20L，21L，22L，22L，23L，24L，24L， 24L，25L，25L，25L，26L，27L，28L，28L，28L，29L，29L，29L，30L， 31L，32L，32L，33L，34L，35L，36L，38L，38L，39L，39L，40L，41L， 41L，42L，42L，42L，43L，43L，43L，44L，46L，46L，46L，47L，47L， 47L，48L，48L，48L，49L，49L，49L，50L，51L，52L，52L，53L，53L， 54L，54L，56L，56L，57L，57L，57L，58L，59L，60L，61L，63L，64L， 65L，66L，67L，69L，70L，71L，72L，73L，74L，74L，5L，6L，7L， 8L，10L，11L，12L，13L，14L，15L，16L，17L），。标签= c（＆＃34; 10＆＃34;，＆＃34; 11＆＃34;，＆＃34; 12＆＃34;，＆＃34; 13＆＃34;，＆＃34; 136＆＃34;，＆＃34; 137＆＃34;，＆＃34; 138＆＃34;，＆＃34; 139＆＃34;，＆＃34; 14＆＃34;，＆＃34; 140＆＃34;，＆＃34; 141＆＃34;，＆＃34; 142＆＃34;，＆＃34; 143＆＃34;，＆＃34; 144＆＃34;，＆＃34; 145＆＃34;，＆＃34; 146＆＃34;，＆＃34; 147＆＃34;，＆＃34; 26＆＃34;，＆＃34; 27＆＃34;，＆＃34; 28＆＃34;，＆＃34; 29＆＃34;，＆＃34; 30＆＃34;，＆＃34; 31＆＃34;，＆＃34; 37＆＃34;，＆＃34; 38＆＃34;，＆＃34; 39＆＃34;，＆＃34; 40＆＃34;，＆＃34; 41＆＃34;，＆＃34; 42＆＃34;，＆＃34; 43＆＃34;，＆＃34; 44＆＃34;，＆＃34; 45＆＃34;，＆＃34; 46＆＃34;，＆＃34; 47＆＃34;，＆＃34; 48＆＃34;，＆＃34; 49＆＃34;，＆＃34; 5＆＃34;，＆＃34; 50＆＃34;，＆＃34; 51＆＃34;，＆＃34; 52＆＃34;，＆＃34; 53＆＃34;，＆＃34; 57＆＃34;，＆＃34; 58＆＃34;，＆＃34; 59＆＃34;，＆＃34; 6＆＃34;，＆＃34; 60＆＃34;，＆＃34; 61＆＃34;，＆＃34; 62＆＃34;，＆＃34; 63＆＃34;，＆＃34; 64＆＃34;，＆＃34; 65＆＃34;，＆＃34; 66＆＃34;，＆＃34; 67＆＃34;，＆＃34; 69＆＃34;，＆＃34; 7＆＃34;，＆＃34; 70＆＃34;，＆＃34; 71＆＃34;，＆＃34; 72＆＃34;，＆＃34; 75＆＃34;，＆＃34; 76＆＃34;，＆＃34; 77＆＃34;，＆＃34; 8＆＃34;，＆＃34; 82＆＃34;，＆＃34; 83＆＃34;，＆＃34; 84＆＃34;，＆＃34; 85＆＃34;，＆＃34; 86＆＃34;，＆＃34; 9＆＃34;，＆＃34; 90＆＃34;，＆＃34; 94＆＃34;，＆＃34; 95＆＃34;，＆＃34; 96＆＃34;，＆＃34; 97＆＃34;，＆＃34; 98＆＃34;），class =＆＃34; factor＆＃34;），年=结构（c（1L，2L，1L，2L，3L，1L，2L，2L，3L，2L， 3L，2L，3L，2L，2L，3L，2L，2L，3L，1L，2L，3L，1L，2L，3L， 2L，3L，2L，1L，2L，2L，1L，2L，3L，1L，2L，3L，2L，2L，1L， 2L，3L，1L，2L，3L，2L，2L，2L，3L，2L，2L，2L，2L，2L，3L， 2L，3L，2L，2L，3L，1L，2L，3L，1L，2L，3L，2L，1L，2L，3L， 1L，2L，3L，1L，2L，3L，1L，2L，3L，2L，2L，1L，2L，1L，2L， 1L，2L，1L，2L，1L，2L，3L，2L，1L，1L，1L，1L，1L，1L，1L， 1L，1L，1L，1L，1L，1L，1L，2L，3L，3L，3L，3L，3L，3L，3L， 3L，3L，3L，3L，3L），。标签= c（＆＃34; 2012＆＃34;，＆＃34; 2013＆＃34;，＆＃34; 2014＆＃34;），class =＆＃34;因子＆＃34）， value = c（＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;，＆＃34; Pos＆＃34;，＆＃34; Neg＆＃34;））。。Name = c（＆＃34; AnID＆＃34;，＆＃34;年＆＃34;，＆＃34; value＆＃34;），row.names = 187：306，class =＆＃34; data.frame＆＃34;）

Answer 1

请注意，实验/测试设计需要提前进行有效的样本量计算，以便最大限度地捕获具有统计显着性差异的可能性（如果存在）。（有关详细信息，请参阅此处：https://en.wikipedia.org/wiki/Sample_size_determination和https://en.wikipedia.org/wiki/Statistical_power）。

如果您的所有用户都在受试者之前/之后（例如test / contol），您可以进行McNemar测试进行比例比较（参见此处：https://en.wikipedia.org/wiki/McNemar's_test）。

然而，并非所有用户都有重复测量，因此我选择为每个用户随机选择一年，因此我可以有3个独立的值样本。

假设dt是您的数据集...

library(dplyr)

set.seed(1)   # this will help you having a specific random sampling

dt %>%                      
  mutate(Pos = ifelse(value == "Pos", 1, 0)) %>%   # create a binary variable to flag positives
  group_by(AnID) %>%                               # for each user
  sample_n(1) %>%                                  # get one row/value randomly
  group_by(year) %>%                               # for each year
  summarise(N = n(),                               # get number of users
            N_Pos = sum(Pos),                      # get number of positive users
            Prc_Pos = mean(Pos)) %>%               # get percentage of positives
  print() -> tbl1                                  # print and save it

# # A tibble: 3 × 4
#     year     N N_Pos   Prc_Pos
#   <fctr> <int> <dbl>     <dbl>
# 1   2012    23     6 0.2608696
# 2   2013    27     9 0.3333333
# 3   2014    24    13 0.5416667

在观察每年的上述百分比后，您可以进行比例比较

# run the statistical comparison of proportions
prop.test(tbl1$N_Pos, tbl1$N)

# 3-sample test for equality of proportions without continuity correction
# 
# data:  tbl1$N_Pos out of tbl1$N
# X-squared = 4.3038, df = 2, p-value = 0.1163
# alternative hypothesis: two.sided
# sample estimates:
#    prop 1    prop 2    prop 3 
# 0.2608696 0.3333333 0.5416667

这里的P值（0.1163）表明，我们没有任何证据证明这些年份在积极的可能性方面存在差异。

如果您发现差异，可以在年份之间进行成对比较。

# run pairwise comparisons 
pairwise.prop.test(tbl1$N_Pos, tbl1$N)

# Pairwise comparisons using Pairwise comparison of proportions 
# 
# data:  tbl1$N_Pos out of tbl1$N 
# 
# 1    2   
# 2 0.80 -   
# 3 0.29 0.45
# 
# P value adjustment method: holm

这里的输出是3 p值（3对比较）。正如所料，所有这些都表明这些年份之间没有任何差异。

您可以在函数中使用上述过程并创建N个模拟。检查您可以在多少次模拟中找到具有统计意义的结果。

使用重复措施对数据进行应急测试

1 个答案: