我有一个格式化的数据框
如果用户购买新商品,则会获得唯一的id
值,如果同一用户购买了其他商品,则child
列会显示之前的id
。
df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""))
> df
id child
1 s123
2 s1004 s123
3 s1009 s1004
4 s1010
现在我想将新列创建为parent
并具有初始ID值
expect_df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""),parent = c('s123','s123','s123','s1010'))
> expect_df
id child parent
1 s123 s123
2 s1004 s123 s123
3 s1009 s1004 s123
4 s1010 s1010
答案 0 :(得分:1)
数据:(确保您的输入内容为characters
且不 factors
,请确保您的""
为NA
)
df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c(NA,'s123','s1004',NA),stringsAsFactors = F)
代码:
df$parent <- NA
repeat {
sid <- df$id[which(is.na(df$parent))[1]]
df$parent[apply(df,1,function(x){x<-na.omit(x);if(any(x%in%sid)){sid<<-c(sid,x);T;}else{F}})] <- sid[1]
if (all(!is.na(df$parent))) break
}
结果:
# id child parent
# 1 s123 <NA> s123
# 2 s1004 s123 s123
# 3 s1009 s1004 s123
# 4 s1010 <NA> s1010
答案 1 :(得分:0)
m=function(x,df){
n=with(df,child[x==id])
ifelse(is.na(n),x, m(n,df))
}
transform(df,parent=sapply(id,m,df1),row.names=NULL)
id child parent
1 s123 <NA> s123
2 s1004 s123 s123
3 s1009 s1004 s123
4 s1010 <NA> s1010
5 s1103 s1009 s123