Dataframe:列,用于检查每行的前一行的值并输入值

时间:2014-10-27 15:11:21

标签: r loops if-statement dataframe

我真的已经尝试了很多来解决以下问题,而且我已经阅读了很多相关内容。但是,我仍然无法管理它

见这个例子:

time <- sample(1:300, 20)
test <- c (0,0,0, NA, 0, 0, 3, 0, 0, NA, 0,0, 3, 0, 0, NA, 0, 0, 3, 0)
take <- rep(NA, 20)
df <-data.frame(time, test, take)
> head(df, 8)
  time test take
1  271    0   NA
2  147    0   NA
3  277    0   NA
4  247   NA   NA
5   82    0   NA
6  133    0   NA
7  231    3   NA
8  110    0   NA

现在我想在最后一个(take)列中输入值。那里的值取决于第二列(测试)中的条件。如果它是NA或3,它可以保持为空。好的,到目前为止, 但我的问题是值0.如果前一行的值为0,则应在行中加“a”,如果为3则为“b”,其余为“c”。

因此输出应如下所示:

head(df, 8)
      time test take
    1  271    0   c
    2  147    0   a
    3  277    0   a
    4  247   NA   NA
    5   82    0   c
    6  133    0   a
    7  231    3   NA
    8  110    0   b

谢谢你的帮助!

3 个答案:

答案 0 :(得分:1)

尝试:

is0<-which(df$test==0) # indices of test elements = 0
df[is0,"take"]<-"c" # for each test=0, put take="c", as it is the "default" value
for (i in setdiff(is0,1)){ # for each test=0 that is not the first one (because the first row doesn't have a previous row)
    if((i-1) %in% is0) df$take[i]<-"a" else if(df$test[i-1]==3 & !is.na(df$test[i-1])) df$take[i]<-"b" # if in the previous row test=0 then take="a", if it is 3 (and not NA), take="b"
}

答案 1 :(得分:1)

您也可以

indx <- c(FALSE,!df$test[-nrow(df)] & !is.na(df$test)[-nrow(df)])
indx1 <- c(FALSE,df$test[-nrow(df)]==3 & !is.na(df$test)[-nrow(df)])
indx2 <- df$test==3|is.na(df$test)

df$take <- c('c','a','b', NA)[as.numeric(factor(1+2*indx+4*indx1+8*indx2))]

 df$take
 #[1] "c" "a" "a" NA  "c" "a" NA  "b" "a" NA  "c" "a" NA  "b" "a" NA  "c" "a" NA 
 #[20] "b"

答案 2 :(得分:0)

使用包dplyr,您可以将问题分成两部分。

第1部分:编写一个函数,根据前面的行封装你的逻辑以填充take

return_value_based_on_previous_row <- function(x, lagged) {

    if (is.na(x) | x == 3) {
        temp = NA
    } else {

        if (is.na(lagged)) {
            temp = "c"
        } else if (lagged == 0) {
         temp = "a"
        } else if (lagged == 3) {
         temp = "b"
        } 

    }

    return(as.character(temp))

}

第2部分:使用lagmutate逐行处理df

df <-
    df %>% 
    mutate(lag_test = lag(test)) %>% # make temp column which contains previous value of test
    rowwise() %>% # makes the following mutate work on each row separately
    mutate(take = return_value_based_on_previous_row(test, lag_test)) %>%
    select(-lag_test) #remove temp column

给出:

> df
   time test take
1   164    0    c
2    36    0    a
3   279    0    a
4   255   NA   NA
5   241    0    c
6   188    0    a
7   117    3   NA
8    75    0    b
9    60    0    a
10  175   NA   NA
11  238    0    c
12  184    0    a
13  272    3   NA
14  215    0    b
15   49    0    a
16  204   NA   NA
17  291    0    c
18  218    0    a
19  197    3   NA
20  138    0    b