Question

我知道在这个问题上有几个“重复”线程，但是我已经看过它们了，无法弄清楚如何让reshape做我想要的事情。

我有一个数据集，参与者参加了两个测试，每个测试都有两个问题（即问题1测试1，问题2测试1，问题1测试2，问题2测试2）。他们可以得到问题right或wrong。我将测试1的答案更改为0和1，因此更容易发现问题。

df <- read.table(header = T, text = "

subj Q1.test1 Q2.test1 Q1.test2 Q2.test2
 1        0        1    right    wrong
 2        0        1    wrong    wrong

")

我希望将其重塑很长时间，使其看起来像这样：

subj question test1 test2
 1       Q1     0   right
 2       Q1     0   wrong
 1       Q2     1   wrong
 2       Q2     1   wrong

但是，每当我尝试重塑形状时，它都不会产生我想要的数据框。

df.long <- reshape(df, direction = "long",
                    varying = c("Q1.test1", "Q2.test1", "Q1.test2", "Q2.test2"),
                    timevar = "question",
                    times = c("Q1", "Q2"),
                    v.names = c("test1", "test2"),
                    idvar = "subj")

df.long

subj question test1 test2
  1       Q1     0     1
  2       Q1     0     1
  1       Q2 right wrong
  2       Q2 wrong wrong

问题当然是我在reshape中使用的参数；是否可以使用reshape做我要寻找的东西，还是应该寻找其他包装？

谢谢

Answer 1

以下是使用tidyr软件包的方法。注意：创建数据框时，请使用stringsAsFactors = FALSE，否则会出现警告。我将数据帧称为df1。

gather从宽转换为长，separate将列名称拆分为新列，spread创建具有每个测试值的列。

A useful tutorial比较tidyr和reshape。

library(tidyr)

df1 %>% 
  gather(Var, Val, -subj) %>% 
  separate(Var, sep = "\\.", into = c("question", "test")) %>% 
  spread(test, Val)

结果：

  subj question test1 test2
1    1       Q1     0 right
2    1       Q2     1 wrong
3    2       Q1     0 wrong
4    2       Q2     1 wrong

数据：

df1 <- read.table(header = TRUE, 
                  text = "subj Q1.test1 Q2.test1 Q1.test2 Q2.test2
 1        0        1    right    wrong
 2        0        1    wrong    wrong", 
                   stringsAsFactors = FALSE)

重塑具有多个值列的long

1 个答案: