从单个列中提取行以形成两个新列

时间:2017-06-14 12:26:17

标签: r dplyr tidyr tidyverse

更新: 我意识到我最初创建的虚拟数据帧并不反映我正在使用的数据帧的结构。请允许我在这里重新提出我的问题。

我开始的数据框:

StudentAndClass <- c("Anthropology College_Name","x","y",
"Geology College_Name","z","History College_Name", "x","y","z")
df <- data.frame(StudentAndClass)

学生(“x”,“y”,“z”)注册在他们列出的课程中。例如“x”和“y”在人类学中,而“x”,“y”,“z”在历史中。

如何在下面创建所需的数据框?

Student <- c("x", "y", "z", "x", "y","z")
Class <- c("Anthropology College_Name", "Anthropology College_Name",
"Geology College_Name", "History College_Name",
"History College_Name", "History College_Name")
df_tidy <- data.frame(Student, Class)

原帖:

我有一个数据框,其中两个变量的观察值合并在一个列中,如下所示:

StudentAndClass <- c("A","x","y","A","B","z","B","C","x","y","z","C")
df <- data.frame(StudentAndClass)

其中“A”,“B”,“C”代表班级,以及参加这些班级的“x”,“y”,“z”学生。请注意,学生的观察结果是在课堂观察之间进行的。

我想知道如何使用以下格式创建新数据框:

Student <- c("x", "y", "z", "x", "y","z")
Class <- c("A", "A", "B", "C", "C", "C")
df_tidy <- data.frame(Student, Class)

我想提取包含学生观察结果的行并将其放入新专栏,同时确保每个Student观察结果与Class中相应的Class观察值配对列。

2 个答案:

答案 0 :(得分:2)

一种选择是创建terraform output

vector

然后 v1 <- c('x', 'y', 'z') 数据基于逻辑向量和split

rbind

setNames(do.call(cbind, split(df, !df[,1] %in% v1)), c('Student', 'Class')) # Student Class #2 x A #3 y A #6 z B #9 x B #10 y C #11 z C

tidyverse

更新

如果我们需要基于每对相同的元素之间的信息&#39; LETTERS&#39;

library(tidyverse)
df %>%
   group_by(grp = c('Class', 'Student')[(StudentAndClass %in% v1) + 1]) %>%
   mutate(n = row_number())  %>%
   spread(grp, StudentAndClass) %>% 
   select(-n)
# A tibble: 6 x 2
#   Class Student
#* <fctr>  <fctr>
#1      A       x
#2      A       y
#3      B       z
#4      B       x
#5      C       y
#6      C       z

答案 1 :(得分:2)

<强>更新

从本质上讲,您只需要找到哪些索引具有大学名称,使用这些索引获取每个大学的学生范围,然后按照这些范围对主要向量进行子集化。由于学生不能保证在两个相似的值之间嵌套,因此你必须小心任何“空”学院。

college_indices <- which(endsWith(StudentAndClass, 'College_Name'))
colleges <- StudentAndClass[college_indices]
bounds_mat <- rbind(
  start = college_indices,
  end   = c(college_indices[-1], length(StudentAndClass))
)
colnames(bounds_mat) <- colleges
bounds_mat['start', ] <- bounds_mat['start', ] + 1
bounds_mat['end',   ] <- bounds_mat['end',   ] - 1

# This prevents any problems if a college has no listed students
empty_college <- bounds_mat['start', ] > bounds_mat['end', ]
bounds_mat <- bounds_mat[, !empty_college]

class_listing <- apply(
  bounds_mat,
  2,
  function(bounds) {
    StudentAndClass[bounds[1]:bounds[2]]
  }
)
df_tidy <- data.frame(
  Student = unlist(class_listing),
  Class = rep(names(class_listing), lengths(class_listing)),
  row.names = NULL
)