如何在R中创建一个包含3个以上变量的数据透视表

时间:2013-09-04 20:06:59

标签: r reshape

我在使用如下数据框创建数据透视表时遇到问题:

c1   c2          c3         c4
E   5.76         201    A la vista
E   47530.71     201    A la vista
E   82.85        201    A la vista
L   11376.55     201    A la vista
E   6683.37      203    A la vista
E   66726.52     203    A la vista
E   2.39         203    A la vista
E   79066.07     202    Montoxv_a60d
E   14715.71     202    Montoxv_a60d
E   22661.78     202    Montoxv_a60d
L   81146.25     124    Montoxv_a90d
L   471730.2     124    Montoxv_a186d
E   667812.84    124    Montoxv_a186d

我的问题是我不知道如何在R中创建包含四个变量的数据透视表或汇总表,考虑到行中的最终表格,c1c3的级别以列为c4的级别。对于行中考虑的每个级别,c2变量的值必须按总和进行汇总。我想得到这样的东西:

       A la vista   Montoxv_a60d   Montoxv_a186d  Montoxv_a90d
E 201    47619.32       0               0               0  
E 203    73412.28       0               0               0 
E 202    0           116443.56          0               0      
E 124    0              0            667812.84          0 
L 201    11376.55       0               0               0
L 124    0              0            471730.2         81146.25 

7 个答案:

答案 0 :(得分:14)

您可以使用reshape2包中的dcast执行此操作:

dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)

例如:

library(reshape2)
# reproducible version of your data
mydata = read.csv(text="c1,c2,c3,c4
    E,5.76,201,A la vista
    E,47530.71,201,A la vista
    E,82.85,201,A la vista
    L,11376.55,201,A la vista
    E,6683.37,203,A la vista
    E,66726.52,203,A la vista
    E,2.39,203,A la vista
    E,79066.07,202,Montoxv_a60d
    E,14715.71,202,Montoxv_a60d
    E,22661.78,202,Montoxv_a60d
    L,81146.25,124,Montoxv_a90d
    L,471730.2,124,Montoxv_a186d
    E,667812.84,124,Montoxv_a186d", header=TRUE)
result = dcast(mydata, c1 + c3 ~ c4, value.var="c2", fun.aggregate=sum)

产生

  c1  c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
1  E 124       0.00      667812.8          0.0         0.00
2  E 201   47619.32           0.0          0.0         0.00
3  E 202       0.00           0.0     116443.6         0.00
4  E 203   73412.28           0.0          0.0         0.00
5  L 124       0.00      471730.2          0.0     81146.25
6  L 201   11376.55           0.0          0.0         0.00

答案 1 :(得分:3)

ftable(with(mydata, tapply(c2, list(c1,c3,c4), sum) ) )

           A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d

    E 124          NA     667812.84           NA           NA
      201    47619.32            NA           NA           NA
      202          NA            NA    116443.56           NA
      203    73412.28            NA           NA           NA
    L 124          NA     471730.20           NA     81146.25
      201    11376.55            NA           NA           NA
      202          NA            NA           NA           NA
      203          NA            NA           NA           NA

答案 2 :(得分:2)

以下是一些选项,两个在基础R中,一个使用更新的" dplyr"和" tidyr"包。

Base R' reshape无法处理聚合,因此在进行重新整形之前,您需要求助于其他函数(例如aggregate)。

reshape(
  aggregate(c2 ~ c1 + c3 + c4, mydata, sum), 
  direction = "wide", idvar = c("c1", "c3"), timevar = "c4")
#      c1  c3 c2.A la vista c2.Montoxv_a186d c2.Montoxv_a60d c2.Montoxv_a90d
# 1     E 201      47619.32               NA              NA              NA
# 2     L 201      11376.55               NA              NA              NA
# 3     E 203      73412.28               NA              NA              NA
# 4     E 124            NA         667812.8              NA              NA
# 5     L 124            NA         471730.2              NA        81146.25
# 6     E 202            NA               NA        116443.6              NA

如果您的聚合仅涉及总和,您还可以使用xtabs进行聚合。由于您在公式的RHS上有多个值,因此您最终会得到一个多维array,但可以使用ftable轻松强制转换为矩形形式(原样)由@BondedDust在他的回答中完成)。请注意,使用ftable的输出与其他输出略有不同,因为它默认返回分组变量的所有组合,即使在完全空行的情况下也是如此。

ftable(xtabs(c2 ~ c1 + c3 + c4, mydata))
#           c4 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# c1    c3                                                       
#     E 124          0.00     667812.84         0.00         0.00
#       201      47619.32          0.00         0.00         0.00
#       202          0.00          0.00    116443.56         0.00
#       203      73412.28          0.00         0.00         0.00
#     L 124          0.00     471730.20         0.00     81146.25
#       201      11376.55          0.00         0.00         0.00
#       202          0.00          0.00         0.00         0.00
#       203          0.00          0.00         0.00         0.00

最后,您还可以使用" tidyr"和" dplyr"它提供与"重塑"中的工具类似的功能。和" reshape2",但语法略有不同"。

library(tidyr)
library(dplyr)
mydata %>%                     ## The source dataset
  group_by(c1, c3, c4) %>%     ## Grouping variables
  summarise(c2 = sum(c2)) %>%  ## aggregation of the c2 column
  ungroup() %>%                ## spread doesn't seem to like groups
  spread(c4, c2)               ## spread makes the data wide
# Source: local data frame [6 x 6]
# 
#      c1  c3 A la vista Montoxv_a186d Montoxv_a60d Montoxv_a90d
# 1     E 124         NA      667812.8           NA           NA
# 2     E 201   47619.32            NA           NA           NA
# 3     E 202         NA            NA     116443.6           NA
# 4     E 203   73412.28            NA           NA           NA
# 5     L 124         NA      471730.2           NA     81146.25
# 6     L 201   11376.55            NA           NA           NA

答案 3 :(得分:0)

使用rpivotTable,您可以像在Excel中一样进行透视。

install.packages("rpivotTable")
library(rpivotTable) 
data(mtcars)
rpivotTable(mtcars)

答案 4 :(得分:0)

这也可以通过pivottabler包很容易地产生-使用单行快速枢纽功能或更详细的语法:

df <- read.csv(text="c1,c2,c3,c4
    E,5.76,201,A la vista
    E,47530.71,201,A la vista
    E,82.85,201,A la vista
    L,11376.55,201,A la vista
    E,6683.37,203,A la vista
    E,66726.52,203,A la vista
    E,2.39,203,A la vista
    E,79066.07,202,Montoxv_a60d
    E,14715.71,202,Montoxv_a60d
    E,22661.78,202,Montoxv_a60d
    L,81146.25,124,Montoxv_a90d
    L,471730.2,124,Montoxv_a186d
    E,667812.84,124,Montoxv_a186d", header=TRUE)

# quick pivot syntax
library(pivottabler)
qhpvt(df, c("c1","c3"), "c4", "sum(c2)", totals="NONE")

# verbose syntax
library(pivottabler)
pt <- PivotTable$new()
pt$addData(df) 
pt$addColumnDataGroups("c4", addTotal=FALSE)
pt$addRowDataGroups("c1", addTotal=FALSE)
pt$addRowDataGroups("c3", addTotal=FALSE)
pt$defineCalculation(calculationName="calc1", summariseExpression="sum(c2)")
pt$renderPivot()

输出:

output

有关pivottabler软件包的更多信息,请访问: http://pivottabler.org.uk/articles/v01-introduction.html

注意:我是包裹的作者。

答案 5 :(得分:0)

使用 pivot_wider 中的 tidyr 函数,这可以轻松完成

library(tidyr)
tidyr::pivot_wider(data = df, id_cols = c(c1, c3), names_from = c4, values_from = c2, values_fn = sum)

# A tibble: 6 x 6
  c1         c3 `A la vista` Montoxv_a60d Montoxv_a90d Montoxv_a186d
  <chr>   <int>        <dbl>        <dbl>        <dbl>         <dbl>
1 "    E"   201       47619.          NA           NA            NA 
2 "    L"   201       11377.          NA           NA            NA 
3 "    E"   203       73412.          NA           NA            NA 
4 "    E"   202          NA       116444.          NA            NA 
5 "    L"   124          NA           NA        81146.       471730.
6 "    E"   124          NA           NA           NA        667813.


答案 6 :(得分:0)

data.table 包与 reshape2 包类似,具有用于此类操作的函数 melt()require(data.table) setDT(mydata) dcast(mydata, c1 + c3 ~ c4, value.var = "c2", fun.aggregate = sum) 。因此你可以这样做:

mydata = read.csv(text = "c1,c2,c3,c4
    E,5.76,201,A la vista
    E,47530.71,201,A la vista
    E,82.85,201,A la vista
    L,11376.55,201,A la vista
    E,6683.37,203,A la vista
    E,66726.52,203,A la vista
    E,2.39,203,A la vista
    E,79066.07,202,Montoxv_a60d
    E,14715.71,202,Montoxv_a60d
    E,22661.78,202,Montoxv_a60d
    L,81146.25,124,Montoxv_a90d
    L,471730.2,124,Montoxv_a186d
    E,667812.84,124,Montoxv_a186d", header = TRUE)

这也是最快的解决方案。


来自@david-robinson 的数据。

    initial <- tibble(
      y1 = "a, b",
      y2 = "c, d", 
      y3 = "e, f"
    )