如何忽略R中连续的开始零

时间:2015-02-12 06:00:03

标签: r

我正在考虑计算平均值和标准值。开发。数据集中每一行的数据。但是,我想忽略开始的零。

Row 1: 0 0 0 0 9 0 8 5 
Row 2: 0 0 3 5 6 0 0 0

我想计算[9 0 8 5][3 5 6 0 0 0]

的平均值

有没有简单的方法来做R dataframe?

4 个答案:

答案 0 :(得分:4)

也许不是最优雅的,但在这种情况下你可以使用cumsum

尝试:

> apply(mydf, 1, function(x) mean(x[cumsum(x) > 0]))
[1] 5.500000 2.333333

您可以通过将功能移到apply之外来扩展这个想法,以便您可以自定义要添加的功能,如下所示:

myFun <- function(x) {
  x <- x[cumsum(x) > 0]
  c(mean = mean(x), sd = sd(x))
}

apply(mydf, 1, myFun)
#          [,1]     [,2]
# mean 5.500000 2.333333
# sd   4.041452 2.732520

答案 1 :(得分:4)

使用矢量化rowMeans函数怎么样?

rowMeans(replace(dat, col(dat) < max.col(dat != 0, ties.method="first"), NA), na.rm=TRUE)
#[1] 5.500000 2.333333

如果速度是大型数据集的关注点,这将比使用apply快得多。如果没有,apply肯定更具可读性。

不幸的是,这种方法有点伤害灵活性,因为rowX函数对于一切都不存在。 rowSds包中有matrixStats,这也很快:

library(matrixStats)
rowSds(as.matrix(replace(dat, col(dat) < max.col(dat != 0, ties.method="first"), NA)))
#[1] 4.041452 2.732520

答案 2 :(得分:3)

尝试

apply(df1, 1, function(x)
      mean(x[Position(function(y) y >0, x):length(x)]))
#[1] 5.500000 2.333333
apply(df1, 1, function(x) sd(x[Position(function(y)
     y >0, x):length(x)]))
#[1] 4.041452 2.732520

我们可以将它包装在一个函数

f1 <- function(dat, ...){
   args <- as.list(match.call())[-(1:2)]
   res <- sapply(args, function(FUN) apply(dat, 1, function(x){
            x <- x[Position(function(y) y > 0 & !is.na(y), x):length(x)]
          eval(FUN)(x, na.rm=TRUE)
     }
   ))

  colnames(res) <- args
  res
 }

f1(df1, mean)
#        mean
#[1,] 5.500000
#[2,] 2.333333
f1(df1, mean, sd, median)
#        mean       sd median
#[1,] 5.500000 4.041452    6.5
#[2,] 2.333333 2.732520    1.5

f1(df2, mean, sd)
#       mean       sd
#[1,] 7.333333 2.081666
#[2,] 1.500000 3.000000

f1(df3, mean, sd)
#        mean       sd
#[1,] 7.333333 2.081666
#[2,] 1.500000 3.000000

数据

df1 <- structure(list(v1 = c(0L, 0L), v2 = c(0L, 0L), v3 = c(0L, 3L), 
v4 = c(0L, 5L), v5 = c(9L, 6L), v6 = c(0L, 0L), v7 = c(8L, 
0L), v8 = c(5L, 0L)), .Names = c("v1", "v2", "v3", "v4", 
"v5", "v6", "v7", "v8"), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(v1 = c(0L, 0L), v2 = c(0L, 0L), v3 = c(NA, 0), 
v4 = c(0, 0), v5 = c(9L, 6L), v6 = c(NA, 0L), v7 = c(8L, 
0L), v8 = c(5L, 0L)), .Names = c("v1", "v2", "v3", "v4", 
"v5", "v6", "v7", "v8"), row.names = c(NA, -2L), class = "data.frame")

df3 <- structure(list(v1 = c(0L, 0L), v2 = c(0L, 0L), v3 = c(0, 0), 
v4 = c(0, 0), v5 = c(9L, 6L), v6 = c(NA, 0L), v7 = c(8L, 
0L), v8 = c(5L, 0L)), .Names = c("v1", "v2", "v3", "v4", 
"v5", "v6", "v7", "v8"), row.names = c(NA, -2L), class = "data.frame")

答案 3 :(得分:0)

尝试:

c <- c(0, 0, 0, 0, 9, 0, 8, 5
      , 0, 0, 3, 5, 6, 0, 0, 0)
df <- as.data.frame(matrix(c, 2, 8, byrow = T))

for ( i in 1:2 ) { 
  x <- sapply(df[i, 1:8], as.numeric)
  y <- match(NA,match(x, 0))
  z <- x[y:8]
  df[i,"Avg"] <- mean(z)
  df[i,"Sd"] <- sd(z) 
}

rm(c,x,y,z)

df

#   V1 V2 V3 V4 V5 V6 V7 V8      Avg       Sd
# 1  0  0  0  0  9  0  8  5 5.500000 4.041452
# 2  0  0  3  5  6  0  0  0 2.333333 2.732520