我正在尝试整理R脚本中的数据,以便可以对整理后的数据集进行一些统计分析。
其中一列列出了对(其中6对),它们对应于三个独立的输出值“块”。最小可复制数据集如下。
dput(head(data, 6))
structure(list(pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"), block1vals = c(1, 3, 5, 7, 9, 10), block2vals = c(4, 66, 34, 66, 21, 21), block3vals = c(53, 22, 12, 65, 21, 22)), .Names = c("pairs", "block1vals", "block2vals", "block3vals"), row.names = c(NA, 6L), class = "data.frame")
我得到了我的代码来配对,并为给定的块标记每个参与者的A / B / C值,每个块的一列;这有效:
第1块:
data$block1types <- sapply(data$pairs, function(x){
if(x == "ABC") { return("Type A")}
if(x == "ACB") { return("Type A")}
if(x == "BAC") { return("Type B")}
if(x == "BCA") { return("Type B")}
if(x == "CBA") { return("Type C")}
if(x == "CAB") { return("Type C")}
})
第2块:
data$block2types <- sapply(data$pairs, function(x){
if(x == "ABC") { return("Type B")}
if(x == "ACB") { return("Type C")}
if(x == "BAC") { return("Type A")}
if(x == "BCA") { return("Type C")}
if(x == "CBA") { return("Type B")}
if(x == "CAB") { return("Type A")}
})
第3块:
data$block3types <- sapply(data$pairs, function(x){
if(x == "ABC") { return("Type C")}
if(x == "ACB") { return("Type B")}
if(x == "BAC") { return("Type C")}
if(x == "BCA") { return("Type A")}
if(x == "CBA") { return("Type A")}
if(x == "CAB") { return("Type B")}
})
我想做的是现在重新组织数据,以便有一列包含所有“类型A”参与者值(与块A不在哪个无关)以及“类型B”的一个值,并且一个用于“ Type C”。
所以理想的输出是:
data$TypeA <- c(1, 3, 34, 65, 21, 21)
data$TypeB <- c(4, 22, 5, 7, 21, 22)
data$TypeC <- c(53, 66, 12, 66, 9, 10)
我不知道如何解决问题。我这样做的尝试是,在数据集之外创建了两列,希望我可以随后进行传播:
BlockTypes<- combine(data$block1types, data$block2types, data$block3types, .id = NULL)
BlockTotals<- combine(data$block1vals, data$block2vals, data$block3vals, .id = NULL)
然后我尝试这样做:
spread(data, key= BlockTypes, value=BlockTotals, fill = 0)
此操作失败:var
必须计算为单个数字或列名,而不是字符向量。不过,我确实认为,更大的问题是将列放在数据集之外。由于它们不在数据集中,因此我无法对它们使用spread函数。因此,如果不能将合并功能与小标题一起使用,那么我将对如何执行此操作感到困惑。
答案 0 :(得分:1)
如果我全神贯注,我敢肯定会有更好的方法来执行此操作,但是这里有些工作。
首先,我们使用substr函数为您的类型提取第一个,第二个和第三个字符。我使用粘贴功能在执行的提取中包括“类型”部分。这比像您一样进行每种组合都好。
接下来,我们经历了3次数据(每种类型一次)。每次浏览数据时,我们都使用块类型来查看是否应提取块值。
library(tidyverse)
data <- tibble(
pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"),
block1vals = c(1, 3, 5, 7, 9, 10),
block2vals = c(4, 66, 34, 66, 21, 21),
block3vals = c(53, 22, 12, 65, 21, 22)
)
data %>%
mutate(
block1types = paste0("Type ",substr(pairs, 1, 1)),
block2types = paste0("Type ",substr(pairs, 2, 2)),
block3types = paste0("Type ",substr(pairs, 3, 3))) %>%
mutate(
TypeAValues = case_when(
block1types == "Type A" ~ block1vals,
block2types == "Type A" ~ block2vals,
block3types == "Type A" ~ block3vals)) %>%
mutate(
TypeBValues = case_when(
block1types == "Type B" ~ block1vals,
block2types == "Type B" ~ block2vals,
block3types == "Type B" ~ block3vals)) %>%
mutate(
TypeCValues = case_when(
block1types == "Type C" ~ block1vals,
block2types == "Type C" ~ block2vals,
block3types == "Type C" ~ block3vals))
答案 1 :(得分:0)
这是一种利用dplyr
和stringr
软件包的方法。
library(dplyr)
library(stringr)
data %>%
# For each letter, determine the position of that letter in the entry in the 'pairs' column
mutate(a = str_locate(pairs, 'A')[,'start'],
b = str_locate(pairs, 'B')[,'start'],
c = str_locate(pairs, 'C')[,'start']) %>%
# Based on the letter's position, pull the value from the appropriate column
mutate_at(.vars = vars(a, b, c),
.funs = funs(case_when(. == 1 ~ block1vals,
. == 2 ~ block2vals,
. == 3 ~ block3vals)))
调用str_locate()
看起来很奇怪的原因是,调用str_locate()
的输出是一个矩阵。
以下是该函数的输出:
pairs <- c('ABCDE')
str_locate(pairs, 'BC')
start end
[1,] 2 3
要仅返回字母“ B”的位置,您需要从矩阵中提取标题为start
的列。
通过编写以下代码,可以将对str_locate()
的调用与列提取结合起来:
str_locate(pairs, 'BC')['start']