Question

我想计算列范围的rowMeans但是我不能给出colnames的硬编码值（例如c（C1，C3））或范围（例如C1：C3），因为名称和范围都是可变的。我的df看起来像：

> df
  chr name age  MGW.1 MGW.2  MGW.3 HEL.1 HEL.2 HEL.3
1 123  abc  12  10.00    19  18.00    12 13.00   -14
2 234  bvf  24 -13.29    13  -3.02    12 -0.12    24
3 376  bxc  17  -6.95    10 -18.00    15  4.00    -4

这只是一个样本，实际上我的列有MGW.1 ...... MGW.196等等。这里不是给出确切的colnames或确切的范围，而是希望传递colnames的首字母，并希望获得具有该首字母的所有列的平均值。类似于：MGW=rowMeans(df[,MGW.*]), HEL=rowMeans(df[,HEL.*])

所以我的最终输出应该是这样的：

> df
      chr name age  MGW      Hel
    1 123  abc  12  10.00    19
    2 234  bvf  24  13.29    13
    3 376  bxc  17  -6.95    10

我知道这些值不正确但只是为了给你和想法。 其次我要从数据框中删除所有这些行，除了前3个值之外，整行中包含NA。

以下是示例示例的输入：

> dput(df)
structure(list(chr = c(123L, 234L, 376L), name = structure(1:3, .Label = c("abc", 
"bvf", "bxc"), class = "factor"), age = c(12L, 24L, 17L), MGW.1 = c(10, 
-13.29, -6.95), MGW.2 = c(19L, 13L, 10L), MGW.3 = c(18, -3.02, 
-18), HEL.1 = c(12L, 12L, 15L), HEL.2 = c(13, -0.12, 4), HEL.3 = c(-14L, 
24L, -4L)), .Names = c("chr", "name", "age", "MGW.1", "MGW.2", 
"MGW.3", "HEL.1", "HEL.2", "HEL.3"), class = "data.frame", row.names = c(NA, 
-3L))

Answer 1

首先

我认为你正在寻找这个以获得行的意思：

df$mean.Hel <- rowMeans(df[, grep("^HEL.", names(df))])

然后删除列：

df[, grep("^HEL.", names(df))] <- NULL

<强>其次

删除前三个元素后只有NA的行。

rows.delete <- which(rowSums(!is.na(df)[,4:ncol(df)]) == 0)
df <- df[!(1:nrow(df) %in% rows.delete),]

Answer 2

这是一个想法，无需硬编码变量名即可实现所需的输出：

library(dplyr)
library(tidyr)

df %>%
  # remove rows where all values are NA except the first 3 columns
  filter(rowSums(is.na(.[4:length(.)])) != length(.) - 3) %>%
  # gather the data in a tidy format
  gather(key, value, -(chr:age)) %>%
  # separate the key column into label and num allowing 
  # to regroup by variables without hardcoding them
  separate(key, into = c("label", "num")) %>%
  group_by(chr, name, age, label) %>%
  # calculate the mean
  summarise(mean = mean(value, na.rm = TRUE)) %>%
  spread(label, mean)

我冒昧地修改了你的初始数据，以显示逻辑如何适合特殊情况。例如，这里我们有一行（＃4），其中除了前3列的所有值都是NA s（根据您的要求，应该删除此行）和一个混合{{1 s和值（＃5）。在这种情况下，我认为我们希望得到NA的结果，因为MGW处有一个值：

MGW.1

给出了：

#  chr name age  MGW.1 MGW.2  MGW.3 HEL.1 HEL.2 HEL.3
#1 123  abc  12  10.00    19  18.00    12 13.00   -14
#2 234  bvf  24 -13.29    13  -3.02    12 -0.12    24
#3 376  bxc  17  -6.95    10 -18.00    15  4.00    -4
#4 999  zzz  21     NA    NA     NA    NA    NA    NA
#5 888  aaa  12  10.00    NA     NA    NA    NA    NA

数据

#Source: local data frame [4 x 5] #Groups: chr, name, age [4] # # chr name age HEL MGW #* <int> <fctr> <int> <dbl> <dbl> #1 123 abc 12 3.666667 15.666667 #2 234 bvf 24 11.960000 -1.103333 #3 376 bxc 17 5.000000 -4.983333 #4 888 aaa 12 NaN 10.000000

计算列范围内的rowMeans（变量号）

2 个答案: