使用多个数据源进行申请?

时间:2016-05-13 19:19:01

标签: r for-loop apply

我还处于R的开始阶段,但我已经完成了一些功能,现在我正在寻找我的最终“项目”。

我创建了一个函数,它接收我的四个数据源(不同的群体)中的每一个并创建直方图,执行kolmogorov-smirnov测试,然后绘制给定行的任何重要结果。我想要做的是把它变成一个应用函数。但是,问题是我的函数需要四个变量,而且我不知道如何使应用程序获取四个数据源。

hist_fx <- function(w,x,y,z) {
  hist(w,prob=TRUE,col="green",xlim=c(-1,1),ylim=c(0,3))
    lines(density(w),col="red")
    abline(v=c(mean(w)),col="red")

  hist(x,prob=TRUE,col="blue",xlim=c(-1,1),ylim=c(0,3))
    lines(density(x),col="red")
    abline(v=c(mean(x)),col="red")


  hist(y,prob=TRUE,col="yellow",xlim=c(-1,1),ylim=c(0,3))
    lines(density(y),col="red")
    abline(v=c(mean(y)),col="red")


  hist(z,prob=TRUE,col="purple",xlim=c(-1,1),ylim=c(0,3))
    lines(density(z),col="red")
    abline(v=c(mean(z)),col="red")

  all <- c(w,x,y,z)
    hist(all,prob=TRUE,xlim=c(-1,0.5),ylim=c(0,3))
    lines(density(w),col="purple")
    lines(density(x),col="red")
    lines(density(y),col="blue")
    lines(density(z),col="green")

  plot(ecdf(w),col="green")
  plot(ecdf(x),col="blue",add=TRUE)
  plot(ecdf(y),col="red",add=TRUE)
  plot(ecdf(z),col="purple",add=TRUE)

  t1 <- ks.test(w,x)
    print(t1)
  t2 <- ks.test(w,y)
    print(t2)
  t3 <- ks.test(w,z)
    print(t3)

  if(t1$p.value < 0.05) {
        plot(ecdf(w),col="green")
        plot(ecdf(x),col="blue",add=TRUE)
    }
  if(t2p.value < 0.05) {
        plot(ecdf(w),col="green")
        plot(ecdf(y),col="red",add=TRUE)
    }
  if(t3$p.value < 0.05) {
        plot(ecdf(w),col="green")
        plot(ecdf(z),col="purple",add=TRUE)
    }
}

我能够使用此功能同时申请一个群体(即将hist_fx转换为一个变量的函数)。但是,我找不到一种方法可以同时为所有四个人群服务。我已经搞砸了一些for循环,虽然它们还没有成功。

可能有用的最后一件事:我的数据被安排成独立变量是行,而因变量是列。因此,我需要每行运行这些(因此我想到了for循环)。

编辑:

以下是其中一个人群的输入数据:

  

dput(K2)   结构(c(-0.15,0.13,0.23,-0.23,0.06,-0.11,0.107,0.06,   -0.17,0.12,0.06,-0.25,-0.32,0.13,0.06,-0.2,-0.08,0.06,   0.12,0.02,0.11,-0.11,-0.15,0.097,0.347,-0.307,0.097,   -0.047,0.09,0.01,-0.217,0.117,0.03,-0.3,-0.33,0.13,0.19,   -0.24,-0.08,-0.01,0.15,0.61,0.18,-0.15,-0.103,0.135,   0.31,-0.25,0.157,-0.105,-0.08,0.01,-0.165,0.17,0.1,-0.23,   -0.28,0.15,0.13,-0.14,-0.06,0.01,0.07,-0.02,0.11,-0.06,   -0.123,0.13,3.55,-0.27,0.165,-0.065,0.135,0.13,-0.17,   0.135,0.08,-0.21,-0.25,0.2,0.16,-0.18,NA,-0.04,0.05,   -0.02,0.13,-0.14,-0.13,0.098,0.27,-0.193,0.062,-0.08,   0.057,0.028,-0.199,0.1,0.04,-0.24,-0.32,0.13,0.13,-0.15,   -0.05,0.01,0.08,-0.04,0.1,-0.1,-0.14,0.154,0.261,-0.194,   0.1,-0.129,0.063,0.142,-0.136,0.136,0.08,-0.23,-0.24,   0.12,0.1,-0.16,-0.06,0.04,0.09,-0.01,0.04,-0.08,-0.127,   0.133,0.337,-0.06,0.11,-0.107,0.16,0.167,-0.183,0.103,   0.05,-0.2,-0.3,0.22,-0.01,-0.17,-0.14,0.02,0.07,0.01,   0.11,-0.11,-0.155,0.221,0.22,-0.172,0.09,-0.15,0.12,   0.03,-0.153,0.146,0.11,-0.2,-0.24,0.16,0.07,-0.19,-0.1,   0.03,0.17,0.02,0.09,-0.16,-0.062,0.19,0.269,-0.265,0.118,   -0.11,0.126,0.094,-0.186,0.151,0.08,-0.26,-0.31,0.13,   0.09,-0.23,-0.12,0.05,0.13,0.01,0.11,-0.14,-0.095,0.14,   0.24,-0.46,0.09,-0.17,0.08,0.01,-0.24,0.16,0.04,-0.38,   -0.39,0.11,0.06,-0.31,-0.25,0.03,0.21,-0.14,0,-0.22,   -0.07,0.148,0.311,-0.27,0.11,-0.055,0.16,0.04,-0.197,   0.064,0.09,-0.24,-0.34,0.17,0.07,-0.15,-0.18,0.03,0.13,   0.07,0.13,-0.08,-0.136,0.142,0.27,-0.257,0.1,-0.13,0.103,   0.064,-0.197,0.118,0.06,-0.29,-0.35,0.13,0.1,-0.19,-0.13,   0.01,0.1,-0.01,0.13,-0.15),. Dim = c(22L,12L))

进一步澄清,这是实际数据框的格式:

c1 c2 c3 c4

r2 x x x

r3 x x x

r4 x x x

每列代表行上变量的星号值。因此,我想为每个数据集的每一行创建一个直方图。

对于函数的值,我只是为了简单起见而使用了这些变量。 w =人口1,x =人口2,y =人口3,z =人口4。

举个例子:

 > hist_fx(k2[1,],n2[1,],j2[1,],g2[1,])

    Two-sample Kolmogorov-Smirnov test

data:  w and x
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided


    Two-sample Kolmogorov-Smirnov test

data:  w and y
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided


    Two-sample Kolmogorov-Smirnov test

data:  w and z
D = 1, p-value = 1.229e-05
alternative hypothesis: two-sided

我的问题是,目前我只能一次运行一行功能。我希望能够为所有行做到这一点。我正在考虑使用apply,因为我在一个非常相似的上下文中使用它,除了只有一个数据源。

1 个答案:

答案 0 :(得分:0)

不太确定您的需求,但考虑转置,t()以列方式为行数据运行绘图。并考虑使用mapply(),应用系列的多变量类型,它同时为等长对象运行一个操作元素。即使拆分操作,因为一起运行它们可能只打印/绘制最后一次迭代到屏幕。

转置(使用的数据是发布的输出矩阵的轻微变化)

pop1 <- data.frame(t(data))
pop2 <- data.frame(t(data))
pop3 <- data.frame(t(data))
pop4 <- data.frame(t(data))

直方图

hist_fx <- function(w,x,y,z) {

  whist <- hist(w,prob=TRUE,col="green",xlim=c(-1,1),ylim=c(0,3))
  lines(density(w),col="red")
  abline(v=c(mean(w)),col="red")

  xhist <- hist(x,prob=TRUE,col="blue",xlim=c(-1,1),ylim=c(0,3))
  lines(density(x),col="red")
  abline(v=c(mean(x)),col="red")      

  yhist <- hist(y,prob=TRUE,col="yellow",xlim=c(-1,1),ylim=c(0,3))
  lines(density(y),col="red")
  abline(v=c(mean(y)),col="red")      

  zhist <- hist(z,prob=TRUE,col="purple",xlim=c(-1,1),ylim=c(0,3))
  lines(density(z),col="red")
  abline(v=c(mean(z)),col="red")

}

# HISTOGRAM PLOTS FOR EACH DF COLUMN 
output <- mapply(hist_fx, w=pop1, x=pop2, y=pop3, z=pop4)

Kolmogorov-Smirnov测试(使用输入数据的微小变化)

hist_fx <- function(w,x,y,z) {
  t1 <- ks.test(w,x)      
  t2 <- ks.test(w,y)      
  t3 <- ks.test(w,z)   

  if(t1$p.value < 0.05) {
     plot(ecdf(w),col="green")
     plot(ecdf(x),col="blue",add=TRUE)
  }
  if(t2$p.value < 0.05) {
     plot(ecdf(w),col="green")
     plot(ecdf(y),col="red",add=TRUE)
  }
  if(t3$p.value < 0.05) {
     plot(ecdf(w),col="green")
     plot(ecdf(z),col="purple",add=TRUE)
  }

  return(c(t1, t2, t3))
}

output <- mapply(hist_fx, w=pop1, x=pop2, y=pop3, z=pop4)

output    
#             X1                                  
# statistic   0.1666667                           
# p.value     0.9962552                           
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and x"                           
# statistic   0.25                                
# p.value     0.8474885                           
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and y"                           
# statistic   0.08333333                          
# p.value     1                                   
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and z"                           
#             X2                                  
# statistic   0.25                                
# p.value     0.8474885                           
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and x"                           
# statistic   0.08333333                          
# p.value     1                                   
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and y"                           
# statistic   0.1666667                           
# p.value     0.9962552                           
# alternative "two-sided"                         
# method      "Two-sample Kolmogorov-Smirnov test"
# data.name   "w and z"           
# ...