我在使用 R 时遇到问题,试图为非常大的表编写代码。
我想为每个 500 个位置(列 $pos)的窗口计算 $variable_1 和 $variable_2 的平均值步长 500 个位置。给你看看,可能更容易理解!
输入表:
data_FST = data.frame(scaffold=c(rep("Scaffold_1",1000),
rep("Scaffold_2",2000),
rep("Scaffold_3",450)),
variable_1=sample(1:5000,3450, replace=TRUE),
variable_2=sample(1:5000,3450, replace=TRUE),
pos=c(seq(1,2000,2),1:2000,1:450))
所需的输出表:
scaffold pos variable_1 variable_2
Scaffold_1 500 mean_variable(1:500) mean_variable(1:500)
Scaffold_1 1000 mean_variable(501:1000) mean_variable(501:1000)
Scaffold_2 500 mean_variable(1:500) mean_variable(1:500)
Scaffold_2 1000 mean_variable(500:1000) mean_variable(500:1000)
Scaffold_2 1500 mean_variable(1000:1500) mean_variable(1000:1500)
Scaffold_2 2000 mean_variable(1500:2000) mean_variable(1500:2000)
Scaffold_3 500 mean_variable(1:500) mean_variable(1:500)
非常感谢
答案 0 :(得分:1)
您可以通过将 pos
值除以每 500 个值并取 mean
的 variable
来创建一个新组。
library(dplyr)
data_FST %>%
group_by(scaffold, pos = ceiling(pos/500) * 500) %>%
summarise(variable_1 = mean(variable))
# scaffold pos variable_1
# <chr> <dbl> <dbl>
#1 Scaffold_1 500 126.
#2 Scaffold_1 1000 376.
#3 Scaffold_1 1500 626.
#4 Scaffold_1 2000 876.
#5 Scaffold_2 500 1250.
#6 Scaffold_2 1000 1750.
#7 Scaffold_2 1500 2250.
#8 Scaffold_2 2000 2750.
#9 Scaffold_3 500 3226.