在R中转换数据框以应用数据透视表

时间:2019-11-08 05:00:39

标签: r dataframe

我有一个如下数据框:

       Red  Green   Black
John    A   B       C
Sean    A   D       C
Tim     B   C       C

如何将其转换为以下形式以应用数据透视表(或者是否可以直接在r中完成而不转换数据):

Names   Code    Type
John    Red     A
John    Green   B
John    Black   C
Sean    Red     A
Sean    Green   D
Sean    Black   C
Tim     Red     B
Tim     Green   C
Tim     Black   C

因此,我的最终目标是通过转换后的数据框上的数据透视表对以下类型进行计数:

Count of Code for each type:

Row Labels  A   B   C   D   Grand Total
John            1   1   1       3      
Sean            1       1   1   3
Tim             1   2           3
Grand Total  2  2   4   1       9
```
reading similar topics did not help that much. 

Thanks in advance!
Regards

3 个答案:

答案 0 :(得分:1)

使用上面第一个类似矩阵的框架中的文字转储:

dat <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), class = "data.frame", row.names = c("John", 
"Sean", "Tim"))

我可以这样做:

library(dplyr)
library(tidyr)
tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names)
#   Names  Code Type
# 1  John   Red    A
# 2  Sean   Red    A
# 3   Tim   Red    B
# 4  John Green    B
# 5  Sean Green    D
# 6   Tim Green    C
# 7  John Black    C
# 8  Sean Black    C
# 9   Tim Black    C

我们可以将其扩展为您的下一个目标:

tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names) %>%
  xtabs(~ Names + Type, data = .)
#       Type
# Names  A B C D
#   John 1 1 1 0
#   Sean 1 0 1 1
#   Tim  0 1 2 0

然后只需要边际:

tibble::rownames_to_column(dat, var = "Names") %>%
  gather(Code, Type, -Names) %>%
  xtabs(~ Names + Type, data = .) %>%
  addmargins()
#       Type
# Names  A B C D Sum
#   John 1 1 1 0   3
#   Sean 1 0 1 1   3
#   Tim  0 1 2 0   3
#   Sum  2 2 4 1   9

答案 1 :(得分:0)

您可以使用reshape()。我不确定您的数据结构,是否存在带有名称的列或它们是否为行名。我已经添加了两个版本。

reshape(dat1, idvar="Names",
        varying=2:4,
        v.names="Type", direction="long",
        timevar="Code", times=c("red", "green", "black"),
        new.row.names=1:9)

reshape(transform(dat2, Names=rownames(dat2)), idvar="Names",
        varying=1:3,
        v.names="Type", direction="long",
        timevar="Code", times=c("red", "green", "black"),
        new.row.names=1:9)

#     V1  Code Type
# 1 John   red    A
# 2 Sean   red    A
# 3  Tim   red    B
# 4 John black    B
# 5 Sean black    D
# 6  Tim black    C
# 7 John green    C
# 8 Sean green    C
# 9  Tim green    C

要获得某种原始版本,您可以执行以下操作:

res <- reshape(transform(dat2, Names=rownames(dat2)), idvar="Names",
               varying=1:3,
               v.names="Type", direction="long",
               timevar="Code")
res
#        Names Code Type
# John.1  John    1    A
# Sean.1  Sean    1    A
# Tim.1    Tim    1    B
# John.2  John    2    B
# Sean.2  Sean    2    D
# Tim.2    Tim    2    C
# John.3  John    3    C
# Sean.3  Sean    3    C
# Tim.3    Tim    3    C

此后,您可以像这样转换为"Code",随意将标签分配给factor列:

res$Code <- factor(res$Code, labels=c("red", "green", "black"))

数据

dat1 <- structure(list(Names = c("John", "Sean", "Tim"), Red = c("A", 
"A", "B"), Green = c("B", "D", "C"), Black = c("C", "C", "C")), row.names = c(NA, 
-3L), class = "data.frame")

dat2 <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), row.names = c("John", "Sean", "Tim"
), class = "data.frame")

答案 2 :(得分:0)

您要做的是(1)创建一个contingency table,然后(2)计算行和列的表项之和。

第一步:创建列联表

我首先使用pivot_longer()而不是gather()来旋转数据,因为它更直观。然后,将table()应用于您感兴趣的两个变量。


# Toy example 

df <- structure(list(Red = c("A", "A", "B"), Green = c("B", "D", "C"
), Black = c("C", "C", "C")), class = "data.frame", row.names = c("John", 
"Sean", "Tim"))

# Pivot the data 
long_df <- tibble::rownames_to_column(df, var = "Names") %>%
  tidyverse::pivot_longer(cols = c(-Names),
               names_to = "Type", 
               values_to = "Code") 

# Create a contingency table 
df_table <- table(long_df$Names, long_df$Code)

步骤2:计算行和列的条目总数。

同样,我只使用了基本的R函数margin.table()。使用此方法还可以保存行和列条目的总和,以供进一步分析。

# Grand total (margin = 1 indicates rows)
df_table %>%
  margin.table(margin = 1)

# Grand total (margin = 2 indicates columns)
df_table %>%
  margin.table(margin = 2)