我想对具有相同名称模式的列中的数据求平均。如果只有数字数据,则其中一些示例会非常有用:
How to calculate the mean of those columns in a data frame with the same column name
但是,我也有一个影响因素的专栏。我可以删除此列,然后删除c(bind)以使其恢复原状,但这似乎很笨拙。有没有办法我可以使用类似!is.factor(x)
的东西来忽略我的另一列?
df <-
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))
char = c("Apple", "banana", "cat", "rainbow")
df = cbind(char, df)
res <- as.data.frame(sapply(unique(names(df)), function(col)
rowMeans(df[names(df) == col] )))
预期结果是:
res
char A B C
Apple 3.0 3 4.5
banana 3.5 6 4.5
cat 4.0 3 4.0
rainbow 2.0 2 2.0
错误是:
` Error in rowMeans(df[names(df) == col]) : 'x' must be numeric `
答案 0 :(得分:0)
使用tidyverse,我想出了以下管道操作
##Recreate the data
df <- as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))
char = c("Apple", "banana", "cat", "rainbow")
df = cbind(char, df)
##Load tidyverse
library(tidyverse)
#Gather the columns with titles, extract the first letter, then summarize
new_df <- df %>% gather(column_type, value, `A.1`:`C.2`) %>%
mutate(initial = str_extract(column_type, "[A-Z]")) %>%
group_by(initial, char) %>%
summarise(mean = mean(value)) %>%
spread(initial, mean)
new_df
答案 1 :(得分:0)
要通过扩展现有资源来获得基本的R解决方案,
df <-
as.data.frame(matrix(c(1,3,3,2,2,5,3,2,3,6,3,2,4,7,3,2,5,4,5,2,6,3,5,2),
ncol=6,
dimnames=list(NULL, c("A.1", "B.1", "C.1", "B.2", "A.2", "C.2"))))
char = c("Apple", "banana", "cat", "rainbow")
df <- cbind(char, df)
names(df) <- gsub('.\\d', '', grep('[a-zA-Z]', names(df), value = TRUE)) ## removes the digit from your groups
res <-
data.frame(
factor = df$char,
sapply(setdiff(unique(names(df)), 'char'), function(col)
rowMeans(df[, names(df) == col]))
)
> res
factor A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0
答案 2 :(得分:0)
在基数R中:您正在寻找以下内容:
aggregate(.~char, reshape(df, 2:ncol(df), idvar = 'char',dir = 'long'), mean)[-2]
char A B C
1 Apple 3.0 3 4.5
2 banana 3.5 6 4.5
3 cat 4.0 3 4.0
4 rainbow 2.0 2 2.0
library(datatable)
melt(setDT(df),'char',patterns(A='^A',B='^B',C='^C'))[,-2][,lapply(.SD,mean),by=char]
char A B C
1: Apple 3.0 3 4.5
2: banana 3.5 6 4.5
3: cat 4.0 3 4.0
4: rainbow 2.0 2 2.0