我的数据如下:
AAA_1 AAA_2 AAA_3 BBB_1 BBB_2 BBB_3 CCC
1 1 1 1 2 2 2 1
2 3 1 4 0 0 0 0
3 5 3 0 1 1 1 1
对于每一行,我想为具有以下共同特征的列创建一个均值
feature <- c("AAA","BBB","CCC")
所需的输出应如下所示:
AAA BBB CCC
1 1 2 1
2 2.6 0 0
3 2.6 1 1
分别为每个模式我能够做到:
data <- read.table("data.txt",header=T,row.name=1)
AAA <- as.matrix(rowMeans(data[ , grepl("AAA" , names( data ) ) ])
但我不知道如何对一行中的不同模式进行部分匹配
还尝试了一些其他的事情:
for (i in 1:length(features)){
feature[i] <- as.matrix(rowMeans(data[ , grepl(feature[i] , names( data ) ) ]))
}
答案 0 :(得分:2)
这是您的另一个选择。看到你的列模式,我选择使用gsub()并获得前三个字母。使用包含AAA,BBB和CCC的ind
,我使用lapply(),对ind
的每个元素的数据进行子集化,计算行均值,并仅为行均值提取列。然后,我使用了bind_cols()并创建了foo
。最后一件事是将列名分配给foo。
library(dplyr)
ind <- unique(gsub("_\\d+$", "", names(mydf)))
lapply(ind, function(x){
select(mydf, contains(x)) %>%
transmute(out = rowMeans(.))
}) %>%
bind_cols() %>%
add_rownames -> foo
names(foo) <- ind
# AAA BBB CCC
# (dbl) (dbl) (dbl)
#1 1.000000 2 1
#2 2.666667 0 0
#3 2.666667 1 1
数据强>
mydf <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L),
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L,
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1",
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA,
-3L))
答案 1 :(得分:2)
假设您的colnames
的结构始终如示例所示,那么您可以拆分名称并进行汇总。
new_names <- unlist(strsplit(names(df),"\\_.*"))
colnames(df) <- new_names
#Testing with your data, we need to prevent the loss of dimension by using drop = FALSE
sapply(unique(new_names), function(i) rowMeans(df[, new_names==i, drop = FALSE]))
# AAA BBB CCC
#[1,] 1.000000 2 1
#[2,] 2.666667 0 0
#[3,] 2.666667 1 1
数据:
df <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L),
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L,
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1",
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA,
-3L))
答案 2 :(得分:1)
price_range num
=====================
10.00 - 19.99 10
20.00 - 29.99 6
30.00 - 39.99 6