定义窗口以计算 R 中的平均值

时间:2021-01-11 12:43:43

标签: r

我在使用 R 时遇到问题,试图为非常大的表编写代码。

我想为每个 500 个位置(列 $pos)的窗口计算 $variable_1$variable_2 的平均值步长 500 个位置。给你看看,可能更容易理解!

输入表:

data_FST = data.frame(scaffold=c(rep("Scaffold_1",1000),
                                 rep("Scaffold_2",2000),
                                 rep("Scaffold_3",450)),
                      variable_1=sample(1:5000,3450, replace=TRUE),
                      variable_2=sample(1:5000,3450, replace=TRUE),
                      pos=c(seq(1,2000,2),1:2000,1:450))

所需的输出表:

scaffold   pos  variable_1               variable_2
Scaffold_1 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_1 1000 mean_variable(501:1000)  mean_variable(501:1000)
Scaffold_2 500  mean_variable(1:500)     mean_variable(1:500)
Scaffold_2 1000 mean_variable(500:1000)  mean_variable(500:1000)
Scaffold_2 1500 mean_variable(1000:1500) mean_variable(1000:1500)
Scaffold_2 2000 mean_variable(1500:2000) mean_variable(1500:2000)
Scaffold_3 500  mean_variable(1:500)     mean_variable(1:500)

非常感谢

1 个答案:

答案 0 :(得分:1)

您可以通过将 pos 值除以每 500 个值并取 meanvariable 来创建一个新组。

library(dplyr)

data_FST %>%
  group_by(scaffold, pos = ceiling(pos/500) * 500) %>%
  summarise(variable_1 = mean(variable))

#  scaffold     pos variable_1
#  <chr>      <dbl>      <dbl>
#1 Scaffold_1   500       126.
#2 Scaffold_1  1000       376.
#3 Scaffold_1  1500       626.
#4 Scaffold_1  2000       876.
#5 Scaffold_2   500      1250.
#6 Scaffold_2  1000      1750.
#7 Scaffold_2  1500      2250.
#8 Scaffold_2  2000      2750.
#9 Scaffold_3   500      3226.
相关问题