1。一步一步

Question

我想将df1转换为df2

旧样本数据框df1

df1 <- structure(list(ID = 1:2,                Group = c(1L, 1L),
                      M1a2hB = c(0.2, 0.3),    M1a3hB = c(0.4, 0.6),
                      M2a2hB = c(0.3, 0.4),    M2a3hB = c(0.6, 0.6),
                      M1r2hB = c(200L, 300L),  M1r3hB = c(400L, 600L),
                      M2r2hB = c(300L, 400L),  M2r3hB = c(600L, 600L)),
                 .Names = c("ID", "Group", "M1a2hB", "M1a3hB", "M2a2hB",
                            "M2a3hB","M1r2hB", "M1r3hB","M2r2hB", "M2r3hB"),
                 class = "data.frame", row.names = c(NA, -2L))

ID Group M1a2hB M1a3hB M2a2hB M2a3hB.... M1r2hB M1r3hB M2r2hB M2r3hB ...
1   1      0.2  0.4    0.3   0.6    ...     200    400   300    600    ...
2   1      0.3  0.6    0.4   0.6    ...     300    600   400    600    ...

此处，df1有100个ID和1100个列。每个结果 m easure有两列用于绝对更改，两列用于相对更改。结果有近270个 m 。

M1a2hB是从时间2到基线的第一次测量的绝对变化，M1a3hB是时间3到基线的绝对变化。同样，M1r2hB是第一个结果从时间2到基线的相对变化，M1r3hB是从时间3到基线的结果的相对变化。

新df2：

ID Group time  M1a           M2a        ...  M1r           M2r        ...
1  1     1     0.0           0.0        ...  000           000         ...
1  1     2     0.2           0.3        ...  200           300         ...
1  1     3     0.4           0.6        ...  400           600         ...
2  1     1     0.0           0.0        ...  000           000         ...
2  1     2     0.3           0.4        ...  300           400         ...
2  1     3     0.6           0.6        ...  600           600         ...

任何提示？随意要求任何澄清。谢谢！期待！

P.S。我试图从以前的帖子中运行一些代码（如果感兴趣，请参见下文），但它们看起来不同，因为df是三维数据，而df2包含额外的时间列

In R, plotting wide form data with ggplot2 or base plot. Is there a way to use ggplot2 without melting wide form data frame?

Reshaping repeated measures data in R wide to long

Answer 1

我们可以使用sub，split使用＆＃39; nm1＆＃39;将该矢量的序列从列名中提取模式，将其用作measure melt 1}}来自＆＃39; wide＆＃39;长期＆＃39;格式。

library(data.table)
nm1 <- sub("\\d+[[:alpha:]]+$", '', names(df1)[-(1:2)])
lst <- split(seq_along(nm1)+2, nm1)
melt(setDT(df1), measure = lst, 
       value.name= names(lst), variable.name= 'time')[order(ID)]
#   ID Group time M1a M1r M2a M2r
#1:  1     1    1 0.2 200 0.3 300
#2:  1     1    2 0.4 400 0.6 600
#3:  2     1    1 0.3 300 0.4 400
#4:  2     1    2 0.6 600 0.6 600

数据

df1 <- structure(list(ID = 1:2, Group = c(1L, 1L),
  M1a2hB = c(0.2, 0.3
), M1a3hB = c(0.4, 0.6), M2a2hB = c(0.3, 0.4),
 M2a3hB = c(0.6, 
0.6), M1r2hB = c(200L, 300L), M1r3hB = c(400L, 600L), 
M2r2hB = c(300L, 
400L), M2r3hB = c(600L, 600L)), .Names = c("ID", "Group", "M1a2hB", 
"M1a3hB", "M2a2hB", "M2a3hB", "M1r2hB", "M1r3hB",
"M2r2hB", "M2r3hB"
), class = "data.frame", row.names = c(NA, -2L))

Answer 2

以下是使用tidyr的答案：

library(dplyr)
library(tidyr)
library(rex)

string_interpretation = 
  rex(capture("M", 
              digits, 
              or("a", "r")), 
      capture(digits))

result = 
  df1 %>%
  gather(string, value, -ID, -Group) %>%
  extract(string, c("variable", "time"), string_interpretation) %>%
  spread(variable, value)

Answer 3

内置static ArrayList<String> credit_num = new ArrayList<String>();可以很好地做到这一点：

base::reshape

如果您在df1 <- structure(list(ID = 1:2, Group = c(1L, 1L), M1a2hB = c(0.2, 0.3), M1a3hB = c(0.4, 0.6), M2a2hB = c(0.3, 0.4), M2a3hB = c(0.6, 0.6), M1r2hB = c(200L, 300L), M1r3hB = c(400L, 600L), M2r2hB = c(300L, 400L), M2r3hB = c(600L, 600L)), .Names = c("ID", "Group", "M1a2hB", "M1a3hB", "M2a2hB", "M2a3hB","M1r2hB", "M1r3hB","M2r2hB", "M2r3hB"), class = "data.frame", row.names = c(NA, -2L)) df1 # ID Group M1a2hB M1a3hB M2a2hB M2a3hB M1r2hB M1r3hB M2r2hB M2r3hB # 1 1 0.2 0.4 0.3 0.6 200 400 300 600 # 2 1 0.3 0.6 0.4 0.6 300 600 400 600 df2 <- reshape(df1, varying=list(c(3,4),c(5,6),c(7,8),c(9,10)), v.names=c("M1a", "M2a", "M1r", "M2r"), timevar="time", times=2:3, direction="long") df2 # ID Group time M1a M2a M1r M2r id # 1 1 2 0.2 0.3 200 300 1 # 2 1 2 0.3 0.4 300 400 2 # 1 1 3 0.4 0.6 400 600 1 # 2 1 3 0.6 0.6 600 600 2个时间点（2小时，3小时）进行n <- 270次测量，请将m <- 2的参数更改为

reshape

我假设您的示例varying=split(1:(n*m*2)+2,rep(1:(n*2), each=m)) # `*2` accounts for doubling by relative and absolute measurements. # `+2` accounts for the `ID` and `Group` columns at the beginning v.names=c(paste0("M", 1:n, "a"), paste0("M", 1:n, "r"))中的time==1是指基线时的测量值，而不是未提及的df2，因为它们似乎全为零。为清楚起见，我将基线显示为1h。 在time==0中显示基线的一种方法是将零值基准测量值添加到df2。

df1

并对其进行排序。

n <- 2  # use n <- 270 for 270 outcomes, measured at each time point, reported both in absolute and relative terms

df1.5 <- data.frame(df1,
    setNames(as.list(rep(0,2*n)), c(paste0("M", 1:n, "a0hB"), paste0("M", 1:n, "r0hB"))))

df2 <- reshape(df1.5, varying=split(1:(n*3*2)+2, c(rep(1:(n*2), each=2), 1:(n*2))),
        v.names=c(paste0("M", 1:n, "a"), paste0("M", 1:n, "r")),
        timevar="time", idvar=c("Group", "ID"), times=c(2,3,0), direction="long")

#  ID Group time M1a M2a M1r M2r
#   1     1    2 0.2 0.3 200 300
#   2     1    2 0.3 0.4 300 400
#   1     1    3 0.4 0.6 400 600
#   2     1    3 0.6 0.6 600 600
#   1     1    0 0.0 0.0   0   0
#   2     1    0 0.0 0.0   0   0

Answer 4

您可以使用我的r包onetree，该包已上传到我的github yikeshu0611。

install.packages("devtools") #if you didnot have devtools packages in r
library(devtools)
install_github("yikeshu0611/onetree") #install onetree package from github

1。一步一步

首先，我将教您如何逐步将宽幅转换为长距离。

library(onetree)
long1=reshape_toLong(data=df1, 
                      id= "ID", 
                      j="newcolumn", 
       value.var.prefix=c("M1a","M2a","M1r","M2r")

在此命令中，j是新列的名称。您将在下面得到long1结果

long1

ID Group newcolumn M1a M2a M1r M2r
1     1       2hB 0.2 0.3 200 300
1     1       3hB 0.4 0.6 400 600
2     1       2hB 0.3 0.4 300 400
2     1       3hB 0.6 0.6 600 600

此外，我们可以在数据long1，M1a，M2a -------，M1r，M2r -----中看到。数据仍然是广泛的数据。我们仍然可以将其转换为long。我们使用M1，M2作为前缀。 a和r作为新列，这是测试方法。命令在下面。

long2=reshape_toLong(data = long1,
                       id = c("ID","newcolumn"),
                        j = "testway",
        value.var.prefix = c("M1","M2"))
long2
   ID newcolumn Group testway    M1    M2
1  1       2hB     1       a   0.2   0.3
2  1       2hB     1       r 200.0 300.0
3  1       3hB     1       a   0.4   0.6
4  1       3hB     1       r 400.0 600.0
5  2       2hB     1       a   0.3   0.4
6  2       2hB     1       r 300.0 400.0
7  2       3hB     1       a   0.6   0.6
8  2       3hB     1       r 600.0 600.0

在这里，我们使用两个变量ID和newcolumn作为id对象。因为在长数据中，将id视为唯一变量，所以如果仅使用ID，则会发生不匹配。您还可以创建一个新的ID，例如idnew。

long1$idnew = 1:nrow(long1)
reshape_toLong(data = long1,
                 id = "idnew",
                 j = "testway",
            value.var.prefix = c("M1","M2"))

继续！在数据long2中，可能有M1，M2，-------。因此long2仍然是一个广泛的数据。是的，我们可以更改的是长数据。 M作为前缀，1,2,3，-----作为新列。但是，ID应该是ID，newcolumn和testway，或者您可以为long2创建一个新ID，以确保ID唯一。

long3=reshape_toLong(data = long2,
                 id = c("ID","newcolumn","testway"),
                 j = "testnumber",
                 value.var.prefix = "M")
long3
   ID newcolumn testway Group testnumber     M
1   1       2hB       a     1          1   0.2
2   1       2hB       a     1          2   0.3
3   1       2hB       r     1          1 200.0
4   1       2hB       r     1          2 300.0
5   1       3hB       a     1          1   0.4
6   1       3hB       a     1          2   0.6
7   1       3hB       r     1          1 400.0
8   1       3hB       r     1          2 600.0
9   2       2hB       a     1          1   0.3
10  2       2hB       a     1          2   0.4
11  2       2hB       r     1          1 300.0
12  2       2hB       r     1          2 400.0
13  2       3hB       a     1          1   0.6
14  2       3hB       a     1          2   0.6
15  2       3hB       r     1          1 600.0
16  2       3hB       r     1          2 600.0

现在，数据long3是绝对长的数据。

前缀非常重要，我们使用以下前缀

首先：M1a，M2a，M1r，M2r
秒：M1，M2
第三名：M

我们将id更改了三次，以使其唯一

第一：ID
秒：ID，新列
第三名：ID，新列，测试台

j是新列

第一：newcolumn
第二：测试场
第三名：测试号

2。快一点

如果每个度量结果都有4个结果：a2，a3，r2 r3。 a：绝对； r：相对； 2：时间2：3：时间3。然后1100列有275个度量结果（1100/4）。因此，我们有M1a2hB，M2a2hB，M3a2hB ------ M275a2hB。 M1a3hB，M2a3hB，M3a3hB ------ M275a3hB和M3就是这样。如果我们使用这样的命令，我们将有一个很长的value.var.prefix。但是，我们可以使用更快的方法通过paste0函数构造前缀。

ma2=paste0("M",1:275,"a")
ma3=paste0("M",1:275,"a")
mr2=paste0("M",1:275,"r")
mr3=paste0("M",1:275,"r")
m=c(ma2,ma3,mr2,mr3)

在df1中，我们只有2个测量结果，因此我们可以在下面使用命令

ma2=paste0("M",1:2,"a")
ma3=paste0("M",1:2,"a")
mr2=paste0("M",1:2,"r")
mr3=paste0("M",1:2,"r")
prefix=c(ma2,ma3,mr2,mr3)

reshape_toLong(data = df1,
                id = "ID",
                 j = "newcolumn",
  value.var.prefix = prefix)

  ID Group newcolumn M1a M2a M1r M2r
1  1     1       2hB 0.2 0.3 200 300
2  1     1       3hB 0.4 0.6 400 600
3  2     1       2hB 0.3 0.4 300 400
4  2     1       3hB 0.6 0.6 600 600

仍然，我们可以使用M1，M2 -----作为前缀，我们将a2hB，a3hB，r2hB，r3hB更改为新列。然后，我们将新列细分为不同的列。

m1=paste0("M",1:2)
m2=paste0("M",1:2)
prefix=c(m1,m2)

long4=reshape_toLong(data = df1,
                id = "ID",
                 j = "newcolumn",
  value.var.prefix = prefix)
long4
  ID Group newcolumn    M1    M2
1  1     1      a2hB   0.2   0.3
2  1     1      a3hB   0.4   0.6
3  1     1      r2hB 200.0 300.0
4  1     1      r3hB 400.0 600.0
5  2     1      a2hB   0.3   0.4
6  2     1      a3hB   0.6   0.6
7  2     1      r2hB 300.0 400.0
8  2     1      r3hB 600.0 600.0

long4$testway=Left(long4$newcolumn,1)
long4$time=Right(long4$newcolumn,3)
long4
  ID Group newcolumn    M1    M2 testway time
1  1     1      a2hB   0.2   0.3       a  2hB
2  1     1      a3hB   0.4   0.6       a  3hB
3  1     1      r2hB 200.0 300.0       r  2hB
4  1     1      r3hB 400.0 600.0       r  3hB
5  2     1      a2hB   0.3   0.4       a  2hB
6  2     1      a3hB   0.6   0.6       a  3hB
7  2     1      r2hB 300.0 400.0       r  2hB
8  2     1      r3hB 600.0 600.0       r  3hB

最后，我们只能使用M作为前缀，以获取绝对数据。

long5=reshape_toLong(data = df1,
                       id = "ID",
                        j = "newcolumn",
         value.var.prefix = "M")
long5
   ID Group newcolumn     M
1   1     1     1a2hB   0.2
2   1     1     1a3hB   0.4
3   1     1     2a2hB   0.3
4   1     1     2a3hB   0.6
5   1     1     1r2hB 200.0
6   1     1     1r3hB 400.0
7   1     1     2r2hB 300.0
8   1     1     2r3hB 600.0
9   2     1     1a2hB   0.3
10  2     1     1a3hB   0.6
11  2     1     2a2hB   0.4
12  2     1     2a3hB   0.6
13  2     1     1r2hB 300.0
14  2     1     1r3hB 600.0
15  2     1     2r2hB 400.0
16  2     1     2r3hB 600.0

然后，我们可以在onetree包中使用Left，Mid和Right函数从左，中和右进行子串操作以获取新列。

long5$testnumber=Left(long5$newcolumn,1)
long5$testway=Mid(long5$newcolumn,2,1)
long5$time=Right(long5$newcolumn,3)
long5
   ID Group newcolumn     M testnumber testway time
1   1     1     1a2hB   0.2          1       a  2hB
2   1     1     1a3hB   0.4          1       a  3hB
3   1     1     2a2hB   0.3          2       a  2hB
4   1     1     2a3hB   0.6          2       a  3hB
5   1     1     1r2hB 200.0          1       r  2hB
6   1     1     1r3hB 400.0          1       r  3hB
7   1     1     2r2hB 300.0          2       r  2hB
8   1     1     2r3hB 600.0          2       r  3hB
9   2     1     1a2hB   0.3          1       a  2hB
10  2     1     1a3hB   0.6          1       a  3hB
11  2     1     2a2hB   0.4          2       a  2hB
12  2     1     2a3hB   0.6          2       a  3hB
13  2     1     1r2hB 300.0          1       r  2hB
14  2     1     1r3hB 600.0          1       r  3hB
15  2     1     2r2hB 400.0          2       r  2hB
16  2     1     2r3hB 600.0          2       r  3hB

在这里，我们使用不同的前缀来获取不同的数据。

首先：使用paste0函数进行构建
秒：M1，M2，M3 -------，仍然粘贴0功能但更简单
第三名：我们仅使用M
我们没有更改id和j

3。结论

在reshape_toLong函数中：

数据：是您要转换的数据
id：是唯一 id变量，可以是一个或多个变量
j：是新变量 name ，您要堆叠时间或序列号
value.var.prefix：是值变量的前缀

重塑R中的数据（宽 - >长）

4 个答案:

数据

1。一步一步

前缀非常重要，我们使用以下前缀

我们将id更改了三次，以使其唯一

j是新列

2。快一点

3。结论