我正在尝试转换包含此类数据的data.frame
COMPANY COUNTRY CURRENCY IND1 WGT1 IND2 WGT2 IND3 WGT3
COMP1 USA USD HEALTH .58 RETAIL .42 <NA> 0
COMP2 USA USD AUTO .78 RETAIL .12 TRANSPRT .1
COMP3 CAN CAD SOFTWARE 1 <NA> 0 <NA> 0
我想将其转换为
COMPANY COUNTRY CURRENCY HEALTH AUTO SOFTWARE RETAIL TRANSPRT
COMP1 USA USD .58 0 0 .42 0
COMP2 USA USD 0 .78 0 .12 .1
COMP3 CAN CAD 0 0 1 0 0
最好的方法是什么? 提前致谢 BE
答案 0 :(得分:2)
我们可以使用melt/dcast
devel
版data.table
中的v1.9.5
,即setDT(df1)
。
将'data.frame'转换为'data.table'(melt
)。来自data.table
的{{1}}可以包含多个measure
列。我们将前缀为“IND”,“WGT”的列名称指定为patterns
,并将“wide”转换为“long”格式。删除“变量”列,将其指定为NULL,然后将dcast
从“long”指定为“wide”,将“value.var”指定为“value2”。
library(data.table)#v1.9.5+
DT <- melt(setDT(df1), measure=patterns('^IND', 'WGT'),
na.rm=TRUE)[, variable:=NULL]
dcast(DT, ...~value1, value.var='value2', fill=0)
# COMPANY COUNTRY CURRENCY AUTO HEALTH RETAIL SOFTWARE TRANSPRT
#1: COMP1 USA USD 0.00 0.58 0.42 0 0.0
#2: COMP2 USA USD 0.78 0.00 0.12 0 0.1
#3: COMP3 CAN CAD 0.00 0.00 0.00 1 0.0
df1 <- structure(list(COMPANY = c("COMP1", "COMP2", "COMP3"),
COUNTRY = c("USA",
"USA", "CAN"), CURRENCY = c("USD", "USD", "CAD"), IND1 = c("HEALTH",
"AUTO", "SOFTWARE"), WGT1 = c(0.58, 0.78, 1), IND2 = c("RETAIL",
"RETAIL", NA), WGT2 = c(0.42, 0.12, 0), IND3 = c(NA, "TRANSPRT",
NA), WGT3 = c(0, 0.1, 0)), .Names = c("COMPANY", "COUNTRY", "CURRENCY",
"IND1", "WGT1", "IND2", "WGT2", "IND3", "WGT3"), row.names = c(NA,
-3L), class = "data.frame")
答案 1 :(得分:2)
这是使用reshape
和xtabs
long<-reshape(df,sep="",varying=4:9,direction="long")
cbind(df[,1:3],as.data.frame.matrix(xtabs(WGT~COMPANY+IND,long)))
COMPANY COUNTRY CURRENCY AUTO HEALTH SOFTWARE RETAIL TRANSPRT COMP1 COMP1 USA USD 0.00 0.58 0 0.42 0.0 COMP2 COMP2 USA USD 0.78 0.00 0 0.12 0.1 COMP3 COMP3 CAN CAD 0.00 0.00 1 0.00 0.0