我正在使用具有结构的df
df = data.frame(customer = c(1,2)
, destination_1 = c("c", "b")
, destination_2 = c("a", NA)
)
+----------+----------------+---------------+
| customer | destination_1 | destination_2 |
+----------+----------------+---------------+
| 1 | c | a |
+----------+----------------+---------------+
| 2 | b | NA |
+----------+----------------+---------------+
没有行(无列)目标的排序。例如。我们看到,对于客户1,他的目的地布置为c,a
我希望将n列添加到df(其中n =整个df中唯一目标的数量)作为标记字段,以显示每个客户是否都曾去过每个目标。 即
df$destination_a <- c(1,0)
df$destination_b <- c(0,1)
df$destination_c <- c(1,0)
+----------+----------------+---------------+---------------+---------------+---------------+
| customer | destination_1 | destination_2 | destination_a | destination_b | destination_c |
+----------+----------------+---------------+---------------+---------------+---------------+
| 1 | c | a | 1 | 0 | 1 |
+----------+----------------+---------------+---------------+---------------+---------------+
| 2 | b | NA | 0 | 1 | 0 |
+----------+----------------+---------------+---------------+---------------+---------------+
我对如何实现这一目标只有丝毫想法。可能在循环中使用Apply?如果可能的话,我将更喜欢基本的R解决方案。我期待听到任何想法。谢谢。
答案 0 :(得分:1)
您可以
# 1) get the number and destination names of required columns by extracting unique values from your dataframe:
df2 <- df[,c(2,3,4,5)] #subsetting variables containing the destinations (modify column numbers if needed)
all <- c()
for (i in 1:ncol(df2)){for (j in 1:nrow(df2)){all <- c(all, as.character((df2[j,i])))}}
all <- sort(unique(all))
# 2) from these values make column names and columns and fill them with NAs
all_names <- paste("destination_", all, sep="")
df[,all_names] <- NA
# At this point you have the dataframe with required columns, and you can
# 3) now you can fill these columns with values (0 or 1):
for (j in 1:nrow(df)){
rowvalues <- c()
for (k in 1:ncol(df)){rowvalues <- c(rowvalues, as.character((df[j,k])))}
for (i in 1:length(all)){
x <- (ncol(df)-length(all))
if (all[i] %in% rowvalues){val <- 1}
else {val <- 0}
df[j, (i+x)] <- val
}
}