Question

我目前有大量数据，看起来与此相似：

cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765

等

我希望进入一个类似于此的长数据框：

cid dyad f op ed junk  id
1   2    0 2  5  0.876 1
1   2    0 4  7  0.876 2
1   5    0 2  4  0.765 1
1   5    1 4  3  0.765 2

我尝试使用gather（）函数以及reshape（）函数，但无法弄清楚如何创建多个列而不是将所有列折叠成长样式

感谢所有帮助

Answer 1

您可以使用基本reshape()函数（粗略地）同时融合多组变量，方法是使用varying参数并将direction设置为"long"。

例如，在这里，您提供了一个包含三个＆＃34;和＃34;的列表。（向量）变量名称varying参数：

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765
", header=TRUE)

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"))

你最终会得到这个：

    cid dyad  junk time f op ed id
1.1   1    2 0.876    1 0  2  5  1
2.1   1    5 0.765    1 0  2  4  2
1.2   1    2 0.876    2 0  4  7  1
2.2   1    5 0.765    2 1  4  3  2

请注意，除了要折叠的三个集合之外，还会创建两个变量：$id变量 - 跟踪原始表中的行号（dat）和{{1变量 - 对应于折叠的原始变量的顺序。现在还有嵌套的行号 - $time，这里只是该行的1.1, 2.1, 1.2, 2.2和$id的值。

如果不确切知道您要跟踪的内容，很难说$time或$id是否是您要用作行标识符的内容，但它们都在那里。

使用参数$time和timevar可能也很有用（例如，您可以将idvar设置为timevar。

NULL

Answer 2

tidyr包可以使用函数收集，分离和传播来解决这个问题：

df<-read.table(header=TRUE, text="cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
               1   5    0  1  2   4   4   3   0.765")

library(tidyr)

print(df %>%gather( name, value, -c(cid, dyad, junk)) %>% 
  separate( name, into=c("name", "id"), sep= -2 ) %>%
  spread( key=c(name), value)
)


#step by step:
  #collect the columns f, op, ed to the common cid, dyad and junk
df<-gather(df, name, value, -c(cid, dyad, junk))
  #separate the number id from the names
df<-separate(df, name, into=c("name", "id"), sep= -2 )
  #made wide again.
df<-spread(df, key=c(name), value)

将数据从宽转换为长（使用多列）

2 个答案: