Question

对于我做错了什么可能真的有一个简单的解释，但是我今天已经做了很长一段时间了，我仍然无法让它发挥作用。我认为这将是在公园散步，但是，我的代码并没有按预期工作。

因此，对于这个例子，假设我有一个数据框，如下所示。

df
Row#   user      columnB    
1        1          NA        
2        1          NA        
3        1          NA        
4        1          31        
5        2          NA        
6        2          NA        
7        2          15        
8        3          18        
9        3          16       
10       3          NA

基本上，我想创建一个新列，它使用第一个（以及最后一个）函数（在TTR库包中）来获取每个用户的第一个非NA值。所以我想要的数据框就是这个。

df
Row#   user      columnB    firstValue
1        1          NA        31
2        1          NA        31 
3        1          NA        31
4        1          31        31
5        2          NA        15
6        2          NA        15 
7        2          15        15
8        3          18        18
9        3          16        18
10       3          NA        18

我主要使用谷歌环顾四周，但我找不到我的确切答案。

这是我尝试过的一些代码，但是我没有得到我想要的结果（注意，我是从记忆中带来的，所以这些还有很多变化，但这些是我一直在尝试的一般形式。

    df$firstValue<-ave(df$columnB,df$user,FUN=first,na.rm=True)
    df$firstValue<-ave(df$columnB,df$user,FUN=function(x){x,first,na.rm=True})
    df$firstValue<-ave(df$columnB,df$user,FUN=function(x){first(x,na.rm=True)})
    df$firstValue<-by(df,df$user,FUN=function(x){x,first,na.rm=True})

失败，这些只是给出每个组的第一个值，即NA。

同样，这些只是我头脑中的几个例子，我玩na.rm，使用na.exclude，na.omit，na.action（na.omit）等...

非常感谢任何帮助。谢谢。

Answer 1

data.table解决方案

require(data.table)
DT <- data.table(df, key="user")
DT[, firstValue := na.omit(columnB)[1], by=user]

Answer 2

以下是plyr的解决方案：

ddply(df, .(user), transform, firstValue=na.omit(columnB)[1])

给出了：

  Row user columnB firstValue
1   1    1      NA         31
2   2    1      NA         31
3   3    1      NA         31
4   4    1      31         31
5   5    2      NA         15
6   6    2      NA         15
7   7    2      15         15
8   8    3      18         18
9   9    3      16         18

如果要捕获最后一个值，可以执行以下操作：

ddply(df, .(user), transform, firstValue=tail(na.omit(columnB),1))

Answer 3

使用data.table

library (data.table)
DT <- data.table(df, key="user")
DT <- setnames(DT[unique(DT[!is.na(columnB), list(columnB), by="user"])], "columnB.1", "first")

Answer 4

使用非常小的辅助函数

finite <- function(x) x[is.finite(x)]

这是一个仅使用标准R函数的单行程序：

df <- cbind(df, firstValue = unlist(sapply(unique(df[,1]), function(user) rep(finite(df[df[,1] == user,2])[1], sum(df[,1] == user))))

为了更好地概述，这里的单线程展开为“多线程”：

# for each user, find the first finite (in this case non-NA) value of the second column and replicate it as many times as the user has rows
# then, the results of all users are joined into one vector (unlist) and appended to the data frame as column
df <- cbind(
  df,
  firstValue = unlist(
    sapply(
       unique(df[,1]),
       function(user) {
         rep(
           finite(df[df[,1] == user,2])[1],
           sum(df[,1] == user)
         )
       }
    )
  )
)

使用“FUN = first”跳过NA值

4 个答案: