修改。

Question

我的问题是因为某些奇怪的原因我计算运行相关性时，对于相同的估算值/相关值，我没有得到相同的p值。

我的目标是计算同一data.frame中两个向量的运行Spearman相关性（下例中的subject1和subject2）。另外，我的窗口（矢量的长度）和stide（每个窗口之间的跳跃/步长）是不变的。因此，当查看下面的公式（来自wiki）时，我应该获得相同的临界t，因此相同的Spearman相关的p值相同。这是因为n表示相同（窗口大小相同），r相同。但是，我的结束p值是不同的。

#Needed pkgs    
require(tidyverse)
require(pspearman)
require(gtools)

#Sample data
set.seed(528)
subject1 <- rnorm(40, mean = 85, sd = 5)

set.seed(528)
subject2 <- c(
  lag(subject1[1:21]) - 10, 
  rnorm(n = 6, mean = 85, sd = 5), 
  lag(subject1[length(subject1):28]) - 10)

df <- data.frame(subject1 = subject1, 
                 subject2 = subject2) %>% 
  rowid_to_column(var = "Time") 

df[is.na(df)] <- subject1[1] - 10

rm(subject1, subject2)

#Function for Spearman
psSpearman <- function(x, y) 
{
  out <- pspearman::spearman.test(x, y,
                                  alternative = "two.sided", 
                                  approximation = "t-distribution") %>% 
    broom::tidy()
  return(data.frame(estimate = out$estimate,
                    statistic = out$statistic,
                    p.value = out$p.value )
}

#Running correlation along the subjects
dfRunningCor <- running(df$subject1, df$subject2, 
                        fun = psSpearman,
                        width = 20,
                        allow.fewer = FALSE, 
                        by = 1,
                        pad = FALSE, 
                        align = "right") %>% 
  t() %>% 
  as.data.frame() 

#Arranging the Results into easy to handle data.frame 
Results <- do.call(rbind.data.frame, dfRunningCor) %>% 
  t() %>%
  as.data.frame() %>%
  rownames_to_column(var = "Win") %>% 
  gather(CorValue, Value, -Win) %>% 
  separate(Win, c("fromIndex", "toIndex")) %>%
  mutate(fromIndex = as.numeric(substring(fromIndex, 2)),
         toIndex = as.numeric(toIndex, 2)) %>%
  spread(CorValue, Value) %>% 
  arrange(fromIndex) %>% 
  select(fromIndex, toIndex, estimate, statistic, p.value)

我的问题是我用估计值（{Spearman rho; Results）绘制estimate，窗口编号（fromIndex）并为p值着色，我应该像在相同区域的相同颜色的“隧道”/“路径” - 我没有。例如，在下图中，红色圆圈中相同高度的点应使用相同的颜色 - 但不是。

图表代码：

Results %>% 
  ggplot(aes(fromIndex, estimate, color = p.value)) + 
  geom_line()

到目前为止我发现的是因为它可能是由于： 1.像Hmisc::rcorr()这样的函数往往不会在小样本或多个关系中给出相同的p.value。这就是为什么我使用pspearman::spearman.test来解释这个问题的原因。 2.样本量小 - 我尝试使用更大的样本量。我仍然遇到同样的问题。我尝试舍入我的p值 - 我仍然遇到同样的问题。

感谢您的帮助！

修改。

ggplot可以“伪”着色吗？可能是ggplot只是插入“最后”颜色直到下一个点？这就是为什么我从第5点到第6点变为“淡蓝色”而从第7点到第8点变为“深蓝色”？

Answer 1

您为p.value变量获得的结果与estimate值保持一致。您可以按如下方式检查：

Results$orderestimate <- order(-abs(Results$estimate))
Results$orderp.value <- order(abs(Results$p.value))
identical(Results$orderestimate ,Results$orderp.value)

我认为你不应该在图表中包含p.value的颜色，这是一种不必要的视觉干扰，很难解释。

如果我是你，我只会显示p.value，并可能包含一个指示estimate变量符号的点。

p <- Results %>% 
  ggplot(aes(fromIndex,  p.value)) + 
  geom_line()

# If you want to display the sign of the estimate
Results$estimate.sign <- as.factor(sign(Results$estimate))
p+geom_point( aes(color = estimate.sign ))

R - 运行Spearman相关性中的p值不一致

修改。

1 个答案: