Question

我有以下矩阵 - example：

        col1      col2      col3
S01LA   "0.0143"  "0.1286"  "---"                          
N01AX "0.0088"    "---"     "0.343"                         
N05AG "0.0927"    "0.8692"  "---"

我希望获得每一行的平均值。我尝试通过改变＆＃34; ---＆＃34;值为NA，然后使用colSums

example[example=='---'] <- NA
row_means <- rowMeans(as.numeric(example), na.rm=TRUE)

给了我错误

Error in colSums(as.numeric(copy_specificity_df), na.rm = TRUE) : 
   'x' must be an array of at least two dimensions

因为as.numeric会使数据框变平。如何获取数据框中所有行的平均值，忽略无法转换为浮点数的元素？

Answer 1

如果您事先知道原始数据中NA值的内容，则可以在na.strings中使用read.table。这有效地将您的数据读取为三个数字列。与args建立朋友。

> dat <- read.table(text = 'col1      col2      col3
  S01LA   "0.0143"  "0.1286"  "---"                          
  N01AX "0.0088"    "---"     "0.343"                         
  N05AG "0.0927"    "0.8692"  "---"', na.strings = "---")
> dat
#         col1   col2  col3
# S01LA 0.0143 0.1286    NA
# N01AX 0.0088     NA 0.343
# N05AG 0.0927 0.8692    NA
> colSums(dat, na.rm = TRUE)
##   col1   col2   col3 
## 0.1158 0.9978 0.3430 
> rowMeans(dat, na.rm = TRUE)
##   S01LA   N01AX   N05AG 
## 0.07145 0.17590 0.48095

Answer 2

这是单向的。

dat <- read.table(text = 'col1      col2      col3
S01LA   "0.0143"  "0.1286"  "---"                          
N01AX "0.0088"    "---"     "0.343"                         
N05AG "0.0927"    "0.8692"  "---"')

首先将因子转换为数值（您可以忽略警告消息）：

dat[] <- lapply(dat, function(x) if (is.factor(x)) as.numeric(as.character(x)) 
                                 else as.numeric(x))

#         col1   col2  col3
# S01LA 0.0143 0.1286    NA
# N01AX 0.0088     NA 0.343
# N05AG 0.0927 0.8692    NA

其次，应用colsums

colSums(dat, na.rm = TRUE)
#   col1   col2   col3 
# 0.1158 0.9978 0.3430

Answer 3

显示您的＆＃34;示例＆＃34;对象以及您所做的尝试向我表明，即使您将对象称为data.frame，它实际上也只是matrix。

我提示你实际上正在使用matrix？

data.frame通常不会在字符串周围打印引号。
as.numeric(some_data_frame)会给您一个关于强制list加倍的错误。

有了这个，这里有一些示例数据：

example <- structure(c("0.0143", "0.0088", "0.0927", "0.1286", 
                 "---", "0.8692", "---", "0.343", "---"), 
               .Dim = c(3L, 3L), 
               .Dimnames = list(c("S01LA", "N01AX", "N05AG"), 
                                c("col1", "col2", "col3")))
example
#       col1     col2     col3   
# S01LA "0.0143" "0.1286" "---"  
# N01AX "0.0088" "---"    "0.343"
# N05AG "0.0927" "0.8692" "---"

如果是这种情况，您可以采取以下方法。

example[example == "---"] <- NA   ## Replace "---" with `NA`
N <- as.numeric(example)          ## Convert to numeric. You can start here
dim(N) <- dim(example)            ## Restore the dimensions
dimnames(N) <- dimnames(example)  ## Restore the dimnames
colMeans(N, na.rm=TRUE)           ## Perform your calculation
#   col1   col2   col3 
# 0.0386 0.4989 0.3430

注意：您实际上可以跳过第一行，但是您将获得warning。

如何获得包含缺失数字的每行的平均值？

3 个答案: