我有这个数据框,每个问题有两个子部分(q1,q1_p2)。我想将第二个子部分的答案移到同一行。
question answer
q1 bleh
q1_p2 bah
q2 meh
q2_p2 bleh
基本上是这样的。
question answer p2
q1 bleh bah
q2 meh bleh
我通常会使用像传播这样的东西,但我不知道如何结合这个事实,即每个问题的价值将不相同。
有什么想法吗?
答案 0 :(得分:2)
如果您的完整数据集遵循示例的结构,那么这就足够了,
library(dplyr)
library(tidyr)
df %>%
group_by(question = sub('_.*', '', question)) %>%
mutate(new = seq(n())) %>%
spread(new, answer) %>%
rename(answer = `1`, p2 = `2`) %>%
ungroup()
# A tibble: 2 × 3
# question answer p2
#* <chr> <fctr> <fctr>
#1 q1 bleh bah
#2 q2 meh bleh
答案 1 :(得分:0)
好吧,这不像我希望的那样整洁,但它有效。
library(data.table)
dt = data.table("question" = c("q1", "q1_p2", "q2", "q2_p2"), "answer" = c("bleh","bah","meh","bleh"))
dt$q = sapply(dt$question ,function(x) strsplit(x, "_")[[1]][1])
dt[ , "Row" := 1:.N]
dt[ , "New" := ifelse(nchar(gsub("\\D","",question)) == 1, "answer", gsub("(.+(?=p\\d+))", "",question, perl = T)), by = .(Row)]
dt = dcast(dt, q ~ New, value.var = "answer")
> dt
q answer p2
1: q1 bleh bah
2: q2 meh bleh
答案 2 :(得分:0)
以下是tidyverse
library(tidyverse)
separate(df1, question, into = c("question", "value")) %>%
mutate(value = replace(value, is.na(value), "answer")) %>%
spread(value, answer)
# question answer p2
#1 q1 bleh bah
#2 q2 meh bleh