我有208列,每列具有重复值(因此总共104个样本X 2)。我想用R循环取所有重复的平均值 任何人都可以建议我
w x y a b e
5 1 1 2 4 1
6 2 2 5 3 6
7 3 3 8 9 3
8 4 6 9 1 3
所以例如我有w
,x
,y
,a
,b
,e
列
我想对w
和x
,y
和a
,b
和e
采取行动
并将平均值打印为另一个数据框名称w_x
,y_a
,b_e
。
答案 0 :(得分:1)
您也可以使用dplyr
+ tidyr
:
library(dplyr)
library(tidyr)
cols = colnames(df)
data.frame(t(df)) %>%
mutate(ID = rep(paste(cols[1:length(cols)%%2 == TRUE], cols[!1:length(cols)%%2], sep = "_"), each = 2)) %>%
group_by(ID) %>%
summarize_all(mean) %>%
gather(variable, value, -ID) %>%
spread(ID, value) %>%
select(-variable)
<强>结果:强>
# A tibble: 4 x 3
b_e w_x y_a
* <dbl> <dbl> <dbl>
1 2.5 3 1.5
2 4.5 4 3.5
3 6.0 5 5.5
4 2.0 6 7.5
数据:强>
df = read.table(text = "w x y a b e
5 1 1 2 4 1
6 2 2 5 3 6
7 3 3 8 9 3
8 4 6 9 1 3", header = TRUE)
答案 1 :(得分:0)
mtcarsd <- mtcars[1:6]
要访问两列中的第一列,请使用c(T,F)
first_cols <- mtcarsd[c(T,F)]
sec_cols <- mtcarsd[c(F,T)]
fs <- first_cols+sec_cols
使用sapply函数查找所需列的平均值
sapply(fs, mean)
答案 2 :(得分:0)
以下是使用循环的详细示例。
df <- data.frame(w = c(5, 6, 7, 8),
x = c(1, 2, 3, 4),
y = c(1, 2, 3, 6),
a = c(2, 5, 8, 9),
b = c(4, 3, 9, 1),
e = c(1, 6, 3, 3))
str(df)
# index of columns on which we will iterate
vect <- seq_len(ncol(df))[seq_len(ncol(df)) %% 2 != 0]
# Extract data frame columns every two columns
# initialize lists
new.lst <- list() # list of dataframes of two consecutive columns
ave.list <- list() # list of averages
for(i in seq_along(vect)){
new.lst[[i]] <- df[, seq(from = vect[i], to = (vect[i] + 1))]
ave.list[[i]] <- rowMeans(new.lst[[i]], na.rm = TRUE)
names(ave.list)[i] <- paste(colnames(new.lst[[i]])[1],
colnames(new.lst[[i]])[2],
sep = "_") # set the names of columns
}
new.lst # list of dataframes of two consecutive columns - complete
ave.list # list of averages - complete
# final dataframe
df2 <- as.data.frame.list(ave.list)
df2