R Tibble整理难题

时间:2019-01-07 21:58:14

标签: r

我正在尝试整理R脚本中的数据,以便可以对整理后的数据集进行一些统计分析。

其中一列列出了对(其中6对),它们对应于三个独立的输出值“块”。最小可复制数据集如下。

dput(head(data, 6)) 
structure(list(pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"), block1vals = c(1, 3, 5, 7, 9, 10), block2vals = c(4, 66, 34, 66, 21, 21), block3vals = c(53, 22, 12, 65, 21, 22)), .Names = c("pairs", "block1vals", "block2vals", "block3vals"), row.names = c(NA, 6L), class = "data.frame")

我得到了我的代码来配对,并为给定的块标记每个参与者的A / B / C值,每个块的一列;这有效:

第1块:

data$block1types <- sapply(data$pairs, function(x){
  if(x == "ABC") { return("Type A")}
  if(x == "ACB") { return("Type A")}
  if(x == "BAC") { return("Type B")}
  if(x == "BCA") { return("Type B")}
  if(x == "CBA") { return("Type C")}
  if(x == "CAB") { return("Type C")}
})

第2块:

data$block2types <- sapply(data$pairs, function(x){
  if(x == "ABC") { return("Type B")}
  if(x == "ACB") { return("Type C")}
  if(x == "BAC") { return("Type A")}
  if(x == "BCA") { return("Type C")}
  if(x == "CBA") { return("Type B")}
  if(x == "CAB") { return("Type A")}
})

第3块:

data$block3types <- sapply(data$pairs, function(x){
 if(x == "ABC") { return("Type C")}
if(x == "ACB") { return("Type B")}
if(x == "BAC") { return("Type C")}
if(x == "BCA") { return("Type A")}
if(x == "CBA") { return("Type A")}
if(x == "CAB") { return("Type B")}
})

我想做的是现在重新组织数据,以便有一列包含所有“类型A”参与者值(与块A不在哪个无关)以及“类型B”的一个值,并且一个用于“ Type C”。

所以理想的输出是:

data$TypeA <- c(1, 3, 34, 65, 21, 21)
data$TypeB <- c(4, 22, 5, 7, 21, 22)
data$TypeC <- c(53, 66, 12, 66, 9, 10)

我不知道如何解决问题。我这样做的尝试是,在数据集之外创建了两列,希望我可以随后进行传播:

BlockTypes<- combine(data$block1types, data$block2types, data$block3types, .id = NULL)     
BlockTotals<- combine(data$block1vals, data$block2vals, data$block3vals, .id = NULL) 

然后我尝试这样做:

spread(data, key= BlockTypes, value=BlockTotals, fill = 0)

此操作失败:var必须计算为单个数字或列名,而不是字符向量。不过,我确实认为,更大的问题是将列放在数据集之外。由于它们不在数据集中,因此我无法对它们使用spread函数。因此,如果不能将合并功能与小标题一起使用,那么我将对如何执行此操作感到困惑。

2 个答案:

答案 0 :(得分:1)

如果我全神贯注,我敢肯定会有更好的方法来执行此操作,但是这里有些工作。

首先,我们使用substr函数为您的类型提取第一个,第二个和第三个字符。我使用粘贴功能在执行的提取中包括“类型”部分。这比像您一样进行每种组合都好。

接下来,我们经历了3次数据(每种类型一次)。每次浏览数据时,我们都使用块类型来查看是否应提取块值。

library(tidyverse)
data <- tibble(
pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"),
block1vals = c(1, 3, 5, 7, 9, 10),
block2vals = c(4, 66, 34, 66, 21, 21),
block3vals = c(53, 22, 12, 65, 21, 22)
)

data %>%
  mutate(
    block1types = paste0("Type ",substr(pairs, 1, 1)),
    block2types = paste0("Type ",substr(pairs, 2, 2)),
    block3types = paste0("Type ",substr(pairs, 3, 3))) %>%
  mutate(
    TypeAValues = case_when(
    block1types == "Type A" ~ block1vals,
    block2types == "Type A" ~ block2vals,
    block3types == "Type A" ~ block3vals)) %>%
  mutate(
    TypeBValues = case_when(
    block1types == "Type B" ~ block1vals,
    block2types == "Type B" ~ block2vals,
    block3types == "Type B" ~ block3vals)) %>%
  mutate(
    TypeCValues = case_when(
    block1types == "Type C" ~ block1vals,
    block2types == "Type C" ~ block2vals,
    block3types == "Type C" ~ block3vals))

答案 1 :(得分:0)

这是一种利用dplyrstringr软件包的方法。

library(dplyr)
library(stringr)

data %>%
  # For each letter, determine the position of that letter in the entry in the 'pairs' column
  mutate(a = str_locate(pairs, 'A')[,'start'],
         b = str_locate(pairs, 'B')[,'start'],
         c = str_locate(pairs, 'C')[,'start']) %>% 
  # Based on the letter's position, pull the value from the appropriate column
  mutate_at(.vars = vars(a, b, c),
            .funs = funs(case_when(. == 1 ~ block1vals,
                                   . == 2 ~ block2vals,
                                   . == 3 ~ block3vals)))

调用str_locate()看起来很奇怪的原因是,调用str_locate()的输出是一个矩阵。

以下是该函数的输出:

pairs <- c('ABCDE')
str_locate(pairs, 'BC')

     start end
[1,]     2   3

要仅返回字母“ B”的位置,您需要从矩阵中提取标题为start的列。

通过编写以下代码,可以将对str_locate()的调用与列提取结合起来:

str_locate(pairs, 'BC')['start']