我在使用如下数据框创建数据透视表时遇到问题:
c1 c2 c3 c4
E 5.76 201 A la vista
E 47530.71 201 A la vista
E 82.85 201 A la vista
L 11376.55 201 A la vista
E 6683.37 203 A la vista
E 66726.52 203 A la vista
E 2.39 203 A la vista
E 79066.07 202 Montoxv_a60d
E 14715.71 202 Montoxv_a60d
E 22661.78 202 Montoxv_a60d
L 81146.25 124 Montoxv_a90d
L 471730.2 124 Montoxv_a186d
E 667812.84 124 Montoxv_a186d
我的问题是我不知道如何在R中创建包含四个变量的数据透视表或汇总表,考虑到行中的最终表格,c1
和c3
的级别以列为c4
的级别。对于行中考虑的每个级别,c2
变量的值必须按总和进行汇总。我想得到这样的东西:
A la vista Montoxv_a60d Montoxv_a186d Montoxv_a90d
E 201 47619.32 0 0 0
E 203 73412.28 0 0 0
E 202 0 116443.56 0 0
E 124 0 0 667812.84 0
L 201 11376.55 0 0 0
L 124 0 0 471730.2 81146.25
答案 0 :(得分:14)
您可以使用reshape2
包中的dcast执行此操作:
dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)
例如:
library(reshape2)
# reproducible version of your data
mydata = read.csv(text="c1,c2,c3,c4
E,5.76,201,A la vista
E,47530.71,201,A la vista
E,82.85,201,A la vista
L,11376.55,201,A la vista
E,6683.37,203,A la vista
E,66726.52,203,A la vista
E,2.39,203,A la vista
E,79066.07,202,Montoxv_a60d
E,14715.71,202,Montoxv_a60d
E,22661.78,202,Montoxv_a60d
L,81146.25,124,Montoxv_a90d
L,471730.2,124,Montoxv_a186d
E,667812.84,124,Montoxv_a186d", header=TRUE)
result = dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)
产生
c1 c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
1 E 124 0.00 667812.8 0.0 0.00
2 E 201 47619.32 0.0 0.0 0.00
3 E 202 0.00 0.0 116443.6 0.00
4 E 203 73412.28 0.0 0.0 0.00
5 L 124 0.00 471730.2 0.0 81146.25
6 L 201 11376.55 0.0 0.0 0.00
答案 1 :(得分:3)
ftable(with(mydata, tapply(c2, list(c1,c3,c4), sum) ) )
A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
E 124 NA 667812.84 NA NA
201 47619.32 NA NA NA
202 NA NA 116443.56 NA
203 73412.28 NA NA NA
L 124 NA 471730.20 NA 81146.25
201 11376.55 NA NA NA
202 NA NA NA NA
203 NA NA NA NA
答案 2 :(得分:2)
以下是一些选项,两个在基础R中,一个使用更新的" dplyr"和" tidyr"包。
Base R' reshape
无法处理聚合,因此在进行重新整形之前,您需要求助于其他函数(例如aggregate
)。
reshape(
aggregate(c2 ~ c1 + c3 + c4, mydata, sum),
direction = "wide", idvar = c("c1", "c3"), timevar = "c4")
# c1 c3 c2.A la vista c2.Montoxv_a186d c2.Montoxv_a60d c2.Montoxv_a90d
# 1 E 201 47619.32 NA NA NA
# 2 L 201 11376.55 NA NA NA
# 3 E 203 73412.28 NA NA NA
# 4 E 124 NA 667812.8 NA NA
# 5 L 124 NA 471730.2 NA 81146.25
# 6 E 202 NA NA 116443.6 NA
如果您的聚合仅涉及总和,您还可以使用xtabs
进行聚合。由于您在公式的RHS上有多个值,因此您最终会得到一个多维array
,但可以使用ftable
轻松强制转换为矩形形式(原样)由@BondedDust在他的回答中完成)。请注意,使用ftable
的输出与其他输出略有不同,因为它默认返回分组变量的所有组合,即使在完全空行的情况下也是如此。
ftable(xtabs(c2 ~ c1 + c3 + c4, mydata))
# c4 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# c1 c3
# E 124 0.00 667812.84 0.00 0.00
# 201 47619.32 0.00 0.00 0.00
# 202 0.00 0.00 116443.56 0.00
# 203 73412.28 0.00 0.00 0.00
# L 124 0.00 471730.20 0.00 81146.25
# 201 11376.55 0.00 0.00 0.00
# 202 0.00 0.00 0.00 0.00
# 203 0.00 0.00 0.00 0.00
最后,您还可以使用" tidyr"和" dplyr"它提供与"重塑"中的工具类似的功能。和" reshape2",但语法略有不同"。
library(tidyr)
library(dplyr)
mydata %>% ## The source dataset
group_by(c1, c3, c4) %>% ## Grouping variables
summarise(c2 = sum(c2)) %>% ## aggregation of the c2 column
ungroup() %>% ## spread doesn't seem to like groups
spread(c4, c2) ## spread makes the data wide
# Source: local data frame [6 x 6]
#
# c1 c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# 1 E 124 NA 667812.8 NA NA
# 2 E 201 47619.32 NA NA NA
# 3 E 202 NA NA 116443.6 NA
# 4 E 203 73412.28 NA NA NA
# 5 L 124 NA 471730.2 NA 81146.25
# 6 L 201 11376.55 NA NA NA
答案 3 :(得分:0)
使用rpivotTable,您可以像在Excel中一样进行透视。
install.packages("rpivotTable")
library(rpivotTable)
data(mtcars)
rpivotTable(mtcars)
答案 4 :(得分:0)
这也可以通过pivottabler
包很容易地产生-使用单行快速枢纽功能或更详细的语法:
df <- read.csv(text="c1,c2,c3,c4
E,5.76,201,A la vista
E,47530.71,201,A la vista
E,82.85,201,A la vista
L,11376.55,201,A la vista
E,6683.37,203,A la vista
E,66726.52,203,A la vista
E,2.39,203,A la vista
E,79066.07,202,Montoxv_a60d
E,14715.71,202,Montoxv_a60d
E,22661.78,202,Montoxv_a60d
L,81146.25,124,Montoxv_a90d
L,471730.2,124,Montoxv_a186d
E,667812.84,124,Montoxv_a186d", header=TRUE)
# quick pivot syntax
library(pivottabler)
qhpvt(df, c("c1","c3"), "c4", "sum(c2)", totals="NONE")
# verbose syntax
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df)
pt$addColumnDataGroups("c4", addTotal=FALSE)
pt$addRowDataGroups("c1", addTotal=FALSE)
pt$addRowDataGroups("c3", addTotal=FALSE)
pt$defineCalculation(calculationName="calc1", summariseExpression="sum(c2)")
pt$renderPivot()
输出:
有关pivottabler
软件包的更多信息,请访问:
http://pivottabler.org.uk/articles/v01-introduction.html
注意:我是包裹的作者。
答案 5 :(得分:0)
使用 pivot_wider
中的 tidyr
函数,这可以轻松完成
library(tidyr)
tidyr::pivot_wider(data = df, id_cols = c(c1, c3), names_from = c4, values_from = c2, values_fn = sum)
# A tibble: 6 x 6
c1 c3 `A la vista` Montoxv_a60d Montoxv_a90d Montoxv_a186d
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 " E" 201 47619. NA NA NA
2 " L" 201 11377. NA NA NA
3 " E" 203 73412. NA NA NA
4 " E" 202 NA 116444. NA NA
5 " L" 124 NA NA 81146. 471730.
6 " E" 124 NA NA NA 667813.
答案 6 :(得分:0)
data.table 包与 reshape2 包类似,具有用于此类操作的函数 melt()
和 require(data.table)
setDT(mydata)
dcast(mydata, c1 + c3 ~ c4,
value.var = "c2", fun.aggregate = sum)
。因此你可以这样做:
mydata = read.csv(text = "c1,c2,c3,c4
E,5.76,201,A la vista
E,47530.71,201,A la vista
E,82.85,201,A la vista
L,11376.55,201,A la vista
E,6683.37,203,A la vista
E,66726.52,203,A la vista
E,2.39,203,A la vista
E,79066.07,202,Montoxv_a60d
E,14715.71,202,Montoxv_a60d
E,22661.78,202,Montoxv_a60d
L,81146.25,124,Montoxv_a90d
L,471730.2,124,Montoxv_a186d
E,667812.84,124,Montoxv_a186d", header = TRUE)
这也是最快的解决方案。
来自@david-robinson 的数据。
initial <- tibble(
y1 = "a, b",
y2 = "c, d",
y3 = "e, f"
)