根据条件创建新列

时间:2018-06-13 16:12:40

标签: r dplyr data.table

我有一个格式化的数据框

如果用户购买新商品,则会获得唯一的id值,如果同一用户购买了其他商品,则child列会显示之前的id

df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""))

> df
     id child
1  s123      
2 s1004  s123
3 s1009  s1004
4 s1010      

现在我想将新列创建为parent并具有初始ID值

expect_df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""),parent = c('s123','s123','s123','s1010'))

> expect_df

     id child parent
1  s123         s123
2 s1004  s123   s123
3 s1009 s1004   s123
4 s1010        s1010

2 个答案:

答案 0 :(得分:1)

数据:(确保您的输入内容为characters factors,请确保您的""NA

df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c(NA,'s123','s1004',NA),stringsAsFactors = F)

代码:

df$parent <- NA

repeat {
    sid <- df$id[which(is.na(df$parent))[1]]

    df$parent[apply(df,1,function(x){x<-na.omit(x);if(any(x%in%sid)){sid<<-c(sid,x);T;}else{F}})] <- sid[1]

    if (all(!is.na(df$parent))) break
}

结果:

#      id child parent
# 1  s123  <NA>   s123
# 2 s1004  s123   s123
# 3 s1009 s1004   s123
# 4 s1010  <NA>  s1010

答案 1 :(得分:0)

m=function(x,df){
   n=with(df,child[x==id])
   ifelse(is.na(n),x, m(n,df))
 }
transform(df,parent=sapply(id,m,df1),row.names=NULL)
     id child parent
1  s123  <NA>   s123
2 s1004  s123   s123
3 s1009 s1004   s123
4 s1010  <NA>  s1010
5 s1103 s1009   s123