给出以下格式的数据帧dat
:
property_id tenant count
1 1 Burlington Coat Factory 1
2 1 Macy's 2
3 1 Sears 3
4 1 AMC Theatres 4
5 1 Macy's Home 5
6 2 Burlington Coat Factory 1
7 2 JCPenney 2
8 2 Value City 3
我们如何产生以下内容?
property_id X1 X2 X3 X4 X5
1 Burlington Coat Factory Macy's Sears AMC Theatres Macy's Home
2 Burlington Coat Factory JCPenney Value City <NA> <NA>
熔化/重塑似乎会产生巨大的稀疏矩阵。
我非常笨拙地使用了以下内容,但这很糟糕,我想寻找一种更好的方法:
df<-data.frame(matrix(NA,1167,20))
df['id']<-unique(dat$property_id)
for(i in seq(1:dim(df)[1])){
df[i,1:length(subset(dat,dat$property_id==df[i,'id'])$tenant)]<-t(subset(dat,dat$property_id==df[i,'id'])$tenant)
}
答案 0 :(得分:1)
spread
似乎正好满足您的需求:
library(tidyverse)
spread(dat, count, tenant)
# A tibble: 2 x 6
# property_id `1` `2` `3` `4` `5`
# <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 1 Burlington Coat Factory Macy's Sears AMC Theatres Macy's Home
# 2 2 Burlington Coat Factory JCPenney Value City NA NA
另一个选择:
library(reshape2)
dcast(dat, property_id ~ count, value.var = "tenant")
# property_id 1 2 3 4 5
# 1 1 Burlington Coat Factory Macy's Sears AMC Theatres Macy's Home
# 2 2 Burlington Coat Factory JCPenney Value City <NA> <NA>
最后:
reshape(dat, v.names = "tenant", idvar = "property_id", timevar = "count", direction = "wide")
# property_id tenant.1 tenant.2 tenant.3 tenant.4 tenant.5
# 1 1 Burlington Coat Factory Macy's Sears AMC Theatres Macy's Home
# 6 2 Burlington Coat Factory JCPenney Value City <NA> <NA>