我有一个以下格式的R数据框:
FIRM WORKER HOURS
FIRM1 A1 H1
FIRM1 A2 H2
FIRM1 A3 H3
FIRM1 B1 H4
FIRM1 B2 H5
FIRM2 A1 H6
FIRM2 C1 H7
有些公司的工人处于不同的教育类别(A,B,C,......)。我想转换数据框,以便教育类别总结为一个自己的列,所有公司每个只有一行。所以我需要将初始数据帧转换为以下形式:
FIRM HOURS_A HOURS_B HOURS_C
FIRM1 H1+H2+H3 H4+H5
FIRM2 H6 H7
最好的方法是什么?
答案 0 :(得分:2)
首先聚合,然后重塑:
数据:
x <- read.table(header=TRUE, text="
FIRM WORKER HOURS
FIRM1 A1 1
FIRM1 A2 2
FIRM1 A3 3
FIRM1 B1 4
FIRM1 B2 5
FIRM2 A1 6
FIRM2 C1 7
")
代码:
tmp <- aggregate(HOURS~FIRM+WORK, data=within(x, WORK <- substr(WORKER,1,1)), sum)
reshape(tmp, idvar="FIRM", timevar="WORK", direction="wide")
结果:
FIRM HOURS.A HOURS.B HOURS.C
1 FIRM1 6 9 NA
2 FIRM2 6 NA 7
答案 1 :(得分:0)
我假设您的意思是您实际上想要对某些值求和,并且您的data.frame看起来像这样:
mydf <- structure(
list(FIRM = c("FIRM1", "FIRM1", "FIRM1", "FIRM1", "FIRM1", "FIRM2", "FIRM2"),
WORKER = c("A", "A", "A", "B", "B", "A", "C"),
HOURS = c(10L, 20L, 15L, 13L, 12L, 9L, 16L)),
.Names = c("FIRM", "WORKER", "HOURS"),
class = "data.frame", row.names = c(NA, -7L))
mydf
# FIRM WORKER HOURS
# 1 FIRM1 A 10
# 2 FIRM1 A 20
# 3 FIRM1 A 15
# 4 FIRM1 B 13
# 5 FIRM1 B 12
# 6 FIRM2 A 9
# 7 FIRM2 C 16
然后,您可以使用xtabs
:
xtabs(HOURS ~ FIRM + WORKER, mydf)
# WORKER
# FIRM A B C
# FIRM1 45 25 0
# FIRM2 9 0 16
或者,您可以melt
数据集并使用dcast
重新整形:
library(reshape2)
dfL <- melt(mydf, id.vars=c("FIRM", "WORKER"))
dcast(dfL, FIRM ~ variable + WORKER, fun.aggregate=sum, value.var="value")
# FIRM HOURS_A HOURS_B HOURS_C
# 1 FIRM1 45 25 0
# 2 FIRM2 9 0 16