Question

我有一个类似于以下内容的数据集：

Class     Status   Name
History   teacher  A
History   student  B
History   student  C
Geo       teacher  A
Geo       student  C
Bio       teacher  B
Bio       student  C

我想获得一个频率交叉表（显示每个人的状态和发生情况的班级）

Student\Teacher  A            B         C
A                   
B                History
C                History;Geo  Bio

该想法将是试图衡量某些人倾向于一起工作的方式（例如，某些学生由于其教授或班级类型而希望学习的方式）以及他们扮演每个角色的频率。每个人都可以是老师或学生，取决于班级，其中一些人从不学习，其他人从不教书。

我尝试了不同的功能，特别是table（）的功能，但是我为一个老师经常有多个学生而无法看到谁在与谁合作的事实感到困惑。不幸的是，我的数据集确实很大，因此实际上不可能手动进行。

我希望这很清楚，但是请允许我知道我是否可以更加精确。

Answer 1

我们可以group_by“名称”，“状态”，paste“类”的unique个元素作为字符串，以及spread从“ long”到“宽”格式

library(tidyverse)
df1 %>% 
  group_by(Name, Status) %>% 
  summarise(Class = str_c(unique(Class), collapse=";")) %>% 
  spread(Name, Class)
# A tibble: 2 x 4
#  Status  A           B       C              
#  <chr>   <chr>       <chr>   <chr>          
#1 student <NA>        History History;Geo;Bio
#2 teacher History;Geo Bio     <NA>

数据

df1 <- structure(list(Class = c("History", "History", "History", "Geo", 
"Geo", "Bio", "Bio"), Status = c("teacher", "student", "student", 
"teacher", "student", "teacher", "student"), Name = c("A", "B", 
"C", "A", "C", "B", "C")), class = "data.frame", row.names = c(NA, 
-7L))

Answer 2

重新排列您的数据-使用@akrun的数据

L <- split(df1, df1$Status)
newdf <- Reduce(function(x, y) merge(x, y, by="Class", all=TRUE), L)

这是它的样子

    Class Status.x Name.x Status.y Name.y
1     Bio  student      C  teacher      B
2     Geo  student      C  teacher      A
3 History  student      B  teacher      A
4 History  student      C  teacher      A

使用tidyverse

result <- newdf %>% 
              group_by(Name.x, Name.y) %>% 
              summarise(Class = paste(Class, collapse=';')) %>% 
              complete(Name.x = LETTERS[1:3], Name.y = LETTERS[1:3]) %>% 
              distinct() %>% 
              spread(Name.y, Class)

结果

# A tibble: 3 x 4
# Groups:   Name [3]
  Name  A           B     C
  <chr> <chr>       <chr> <chr>
1 A     NA          NA    NA
2 B     History     NA    NA
3 C     Geo;History Bio   NA

频率交叉表是否有R函数？

2 个答案:

数据