我在R
中有这种格式的数据customer_key item_key units
2669699 16865 1.00
2669699 16866 1.00
2669699 46963 2.00
2685256 55271 1.00
2685256 43458 1.00
2685256 54977 1.00
2685256 2533 1.00
2685256 55011 1.00
2685256 44785 2.00
但我希望将唯一的head_key作为列,我希望我的其他变量名称是item_key中的唯一值,它们的值将是这样的单位
customer_key '16865' '16866' '46963' '55271' '43458' '54977' '2533'
2669699 1.00 1.00 1.00 0.00 0.00 0.00 0.00
2685256 0.00 0.00 0.00 1.00 1.00 1.00 2.00
请帮我转换数据以进行聚类分析
答案 0 :(得分:3)
这只是一个简单的dcast
任务。假设df
是您的数据集
library(reshape2)
dcast(df, customer_key ~ item_key , value.var = "units", fill = 0)
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
# 1 2669699 0 1 1 0 0 2 0 0 0
# 2 2685256 1 0 0 1 2 0 1 1 1
答案 1 :(得分:3)
这是一种方式。
library(tidyr)
spread(mydf,item_key, units, fill = 0)
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
#1 2669699 0 1 1 0 0 2 0 0 0
#2 2685256 1 0 0 1 2 0 1 1 1
答案 2 :(得分:3)
由于这些套餐已经涵盖(给大家+1),以下是加入聚会的几个基本解决方案:
xtabs
:
xtabs(units ~ customer_key + item_key, df)
# item_key
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
# 2669699 0 1 1 0 0 2 0 0 0
# 2685256 1 0 0 1 2 0 1 1 1
reshape
reshape(df, direction = "wide", idvar = "customer_key", timevar = "item_key")
# customer_key units.16865 units.16866 units.46963 units.55271
# 1 2669699 1 1 2 NA
# 4 2685256 NA NA NA 1
# units.43458 units.54977 units.2533 units.55011 units.44785
# 1 NA NA NA NA NA
# 4 1 1 1 1 2
答案 3 :(得分:2)
library(dplyr); library(tidyr)
df2 <- df %>% arrange(item_key) %>% spread(item_key, units, fill=0)
df2
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271
# 1 2669699 0 1 1 0 0 2 0 0 0
# 2 2685256 1 0 0 1 2 0 1 1 1
数据
df <- structure(list(customer_key = c(2669699L, 2669699L, 2669699L,
2685256L, 2685256L, 2685256L, 2685256L, 2685256L, 2685256L),
item_key = c(16865L, 16866L, 46963L, 55271L, 43458L, 54977L,
2533L, 55011L, 44785L), units = c(1, 1, 2, 1, 1, 1, 1, 1,
2)), .Names = c("customer_key", "item_key", "units"), class = "data.frame", row.names = c(NA,
-9L))