从R中的行向量中提取第n个值

时间:2014-12-16 18:31:49

标签: r

我一直在寻找/思考一种方法,我可以从数据框中的每一行中提取第n个值(例如,第2,第5,第7等)。

例如,我有以下列:

ID   Q1-2013   Q2-2013   Q3-2013  Q4-2013  Q1-2014   Q2-2014   Q3-2014  Q4-2014

在每列下面都有给定的值。我想要做的是从四分之一向量中拉出每一行的第n个值(第2-8列)。因此,例如,如果我正在寻找每行的第二个值,我想要的公式/函数将从第2-8列(Q1-2013到Q4-2014)中的每一行中提取/拉出第二个值。此外,公式/函数也会忽略每行中的空白/ NA值。

2 个答案:

答案 0 :(得分:3)

也许这就是你追求的目标。

我首先修改了每列中有一些NA的虹膜数据集:

iris[] <- lapply(iris, function(x){ x[sample(150, 30, F)] <- NA; x})
head(iris)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4          NA  setosa
#2           NA          NA          1.4          NA  setosa
#3           NA          NA          1.3         0.2  setosa
#4          4.6         3.1          1.5          NA  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4          NA          1.7         0.4  setosa

然后,要提取每行的第二个非空和非NA条目,您可以使用apply(我知道,它不推荐用于数据框,但它可以执行脏工作):

apply(iris, 1, function(x) x[which(!is.na(x) & x != "")[2]])
#  [1] "3.5"       "setosa"    "0.2"       "3.1"       "3.6"       "1.7"       "3.4"       "3.4"       "2.9"       "3.1"       "setosa"   
 #[12] "3.4"       "1.4"       "1.1"       "1.2"       "4.4"       "3.9"       "3.5"       "3.8"       "3.8"       "0.2"       "3.7"      
 #[23] "3.6"       "1.7"       "1.9"       "3.0"       "3.4"       "1.5"       "3.4"       "3.2"       "3.1"       "3.4"       "4.1"      
 #[34] "4.2"       "3.1"       "3.2"       "3.5"       "3.6"       "setosa"    "1.5"       "1.3"       "2.3"       "1.3"       "0.6"      
 #[45] "0.4"       "3.0"       "3.8"       "3.2"       "3.7"       "3.3"       "3.2"       "3.2"       "1.5"       "2.3"       "2.8"      
 #[56] "2.8"       "3.3"       "2.4"       "4.6"       "1.4"       "2.0"       "3.0"       "1.0"       "2.9"       "2.9"       "3.1"      
 #[67] "3.0"       "2.7"       "4.5"       "3.9"       "3.2"       "4.0"       "2.5"       "4.7"       "4.3"       "3.0"       "2.8"      
 #[78] "5.0"       "2.9"       "3.5"       "3.8"       "2.4"       "2.7"       "2.7"       "3.0"       "3.4"       "3.1"       "1.3"      
 #[89] "4.1"       "1.3"       "2.6"       "3.0"       "2.6"       "2.3"       "4.2"       "3.0"       "2.9"       "2.9"       "2.5"      
#[100] "2.8"       "3.3"       "2.7"       "3.0"       "2.9"       "3.0"       "3.0"       "4.5"       "2.9"       "5.8"       "3.6"      
#[111] "3.2"       "1.9"       "5.5"       "2.0"       "5.1"       "3.2"       "5.5"       "3.8"       "virginica" "1.5"       "3.2"      
#[122] "2.8"       "2.8"       "2.7"       "2.1"       "6.0"       "2.8"       "3.0"       "2.8"       "5.8"       "2.8"       "3.8"      
#[133] "5.6"       "1.5"       "2.6"       "3.0"       "5.6"       "5.5"       "4.8"       "3.1"       "5.6"       "5.1"       "2.7"      
#[144] "3.2"       "3.3"       "3.0"       "2.5"       "5.2"       "5.4"       "3.0"      

由于apply将首先将数据框转换为matrix,因此在这种情况下,所有列都会被归为同一类型character。您可以稍后将其转换为您想要的任何内容(但请注意,您无法将输出向量直接转换为数字,因为它包含一些字符串,例如&#34; setosa&#34;等)。

答案 1 :(得分:0)

您还可以使用convenient

中的naLast函数library(SOfun)
library(SOfun)
dat[dat==''] <- NA #convert all `blank` cells to `NA`
n <- 2 # the row/column index that needs to be extracted
naLast(dat, by='col')[n,] #get the 2nd non-empty/nonNA element for each columns
#V1  V2  V3  V4  V5 
#"G" "B" "B" "B" "C" 

apply

相同
 apply(dat, 2, function(x) x[which(!is.na(x) & x!='')[2]])
 #V1  V2  V3  V4  V5 
 #"G" "B" "B" "B" "C" 

您也可以指定by='row'

naLast(dat, by='row')[,n] #get the 2nd non-empty/nonNA element for each row
#  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
#"G" "D" "B" "G" "E" "B" "J" "F" "F" "A" "H" "C" "A" "D" "H" "D" "J" "C" "A" "A" 

数据

set.seed(25)
dat <- as.data.frame(matrix(sample(c(NA,'',LETTERS[1:10]), 
        20*5, replace=TRUE), ncol=5), stringsAsFactors=FALSE)

您可以通过

安装软件包
 library(devtools)
 install_github("mrdwab/SOfun")