我一直在寻找/思考一种方法,我可以从数据框中的每一行中提取第n个值(例如,第2,第5,第7等)。
例如,我有以下列:
ID Q1-2013 Q2-2013 Q3-2013 Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014
在每列下面都有给定的值。我想要做的是从四分之一向量中拉出每一行的第n个值(第2-8列)。因此,例如,如果我正在寻找每行的第二个值,我想要的公式/函数将从第2-8列(Q1-2013到Q4-2014)中的每一行中提取/拉出第二个值。此外,公式/函数也会忽略每行中的空白/ NA值。
答案 0 :(得分:3)
也许这就是你追求的目标。
我首先修改了每列中有一些NA的虹膜数据集:
iris[] <- lapply(iris, function(x){ x[sample(150, 30, F)] <- NA; x})
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 NA setosa
#2 NA NA 1.4 NA setosa
#3 NA NA 1.3 0.2 setosa
#4 4.6 3.1 1.5 NA setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 NA 1.7 0.4 setosa
然后,要提取每行的第二个非空和非NA条目,您可以使用apply(我知道,它不推荐用于数据框,但它可以执行脏工作):
apply(iris, 1, function(x) x[which(!is.na(x) & x != "")[2]])
# [1] "3.5" "setosa" "0.2" "3.1" "3.6" "1.7" "3.4" "3.4" "2.9" "3.1" "setosa"
#[12] "3.4" "1.4" "1.1" "1.2" "4.4" "3.9" "3.5" "3.8" "3.8" "0.2" "3.7"
#[23] "3.6" "1.7" "1.9" "3.0" "3.4" "1.5" "3.4" "3.2" "3.1" "3.4" "4.1"
#[34] "4.2" "3.1" "3.2" "3.5" "3.6" "setosa" "1.5" "1.3" "2.3" "1.3" "0.6"
#[45] "0.4" "3.0" "3.8" "3.2" "3.7" "3.3" "3.2" "3.2" "1.5" "2.3" "2.8"
#[56] "2.8" "3.3" "2.4" "4.6" "1.4" "2.0" "3.0" "1.0" "2.9" "2.9" "3.1"
#[67] "3.0" "2.7" "4.5" "3.9" "3.2" "4.0" "2.5" "4.7" "4.3" "3.0" "2.8"
#[78] "5.0" "2.9" "3.5" "3.8" "2.4" "2.7" "2.7" "3.0" "3.4" "3.1" "1.3"
#[89] "4.1" "1.3" "2.6" "3.0" "2.6" "2.3" "4.2" "3.0" "2.9" "2.9" "2.5"
#[100] "2.8" "3.3" "2.7" "3.0" "2.9" "3.0" "3.0" "4.5" "2.9" "5.8" "3.6"
#[111] "3.2" "1.9" "5.5" "2.0" "5.1" "3.2" "5.5" "3.8" "virginica" "1.5" "3.2"
#[122] "2.8" "2.8" "2.7" "2.1" "6.0" "2.8" "3.0" "2.8" "5.8" "2.8" "3.8"
#[133] "5.6" "1.5" "2.6" "3.0" "5.6" "5.5" "4.8" "3.1" "5.6" "5.1" "2.7"
#[144] "3.2" "3.3" "3.0" "2.5" "5.2" "5.4" "3.0"
由于apply
将首先将数据框转换为matrix
,因此在这种情况下,所有列都会被归为同一类型character
。您可以稍后将其转换为您想要的任何内容(但请注意,您无法将输出向量直接转换为数字,因为它包含一些字符串,例如&#34; setosa&#34;等)。
答案 1 :(得分:0)
您还可以使用convenient
naLast
函数library(SOfun)
library(SOfun)
dat[dat==''] <- NA #convert all `blank` cells to `NA`
n <- 2 # the row/column index that needs to be extracted
naLast(dat, by='col')[n,] #get the 2nd non-empty/nonNA element for each columns
#V1 V2 V3 V4 V5
#"G" "B" "B" "B" "C"
与apply
apply(dat, 2, function(x) x[which(!is.na(x) & x!='')[2]])
#V1 V2 V3 V4 V5
#"G" "B" "B" "B" "C"
您也可以指定by='row'
naLast(dat, by='row')[,n] #get the 2nd non-empty/nonNA element for each row
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#"G" "D" "B" "G" "E" "B" "J" "F" "F" "A" "H" "C" "A" "D" "H" "D" "J" "C" "A" "A"
set.seed(25)
dat <- as.data.frame(matrix(sample(c(NA,'',LETTERS[1:10]),
20*5, replace=TRUE), ncol=5), stringsAsFactors=FALSE)
您可以通过
安装软件包 library(devtools)
install_github("mrdwab/SOfun")