Question

道歉，如果这是一个简单的问题，但我在搜索后找不到简单的解决方案。我对R很新，并且无法使用melt（reshape2）或gather（tidyr）函数将宽格式转换为长格式。我正在使用的数据集包含22个不同的时间变量，每个时间变量为3个时间段。当我尝试一次将所有这些从宽格式转换为长格式时，会出现问题。我已经成功地单独转换它们，但它效率很低而且很长，所以我想知道是否有人可以建议一个更简单的解决方案。下面是我创建的示例数据集，其格式与我正在使用的数据集类似：

Subject <- c(1, 2, 3)
BlueTime1 <- c(2, 5, 6)
BlueTime2 <- c(4, 6, 7)
BlueTime3 <- c(1, 2, 3)
RedTime1 <- c(2, 5, 6)
RedTime2 <- c(4, 6, 7)
RedTime3 <- c(1, 2, 3)
GreenTime1 <- c(2, 5, 6)
GreenTime2 <- c(4, 6, 7)
GreenTime3 <- c(1, 2, 3)

sample.df <- data.frame(Subject, BlueTime1, BlueTime2, BlueTime3,
                    RedTime1, RedTime2, RedTime3,
                    GreenTime1,GreenTime2, GreenTime3)

对我有用的解决方案是使用来自tidyr的收集功能，按主题排列数据（以便将每个主题的数据分组在一起），然后仅选择主题，时间段和评级。这是针对每个变量完成的（在我的例子中为22）。

install.packages("dplyr")
install.packages("tidyr")
library(dplyr)
library(tidyr)

BlueGather <- gather(sample.df, Time_Blue, Rating_Blue, c(BlueTime1,
                                                          BlueTime2,
                                                          BlueTime3))
BlueSorted <- arrange(BlueGather, Subject)

BlueSubtracted <- select(BlueSorted, Subject, Time_Blue, Rating_Blue)

在此代码之后，我将所有内容组合到一个数据框中。这对我来说似乎非常缓慢和低效，并希望有人可以帮助我找到一个更简单的解决方案。谢谢！

Answer 1

我们可以使用melt中的data.table measure，可以将多个pattern列作为正则表达式library(data.table) melt(setDT(sample.df), measure = patterns("^Blue", "^Red", "^Green"), value.name = c("BlueTime", "RedTime", "GreenTime"), variable.name = "time") # Subject time BlueTime RedTime GreenTime #1: 1 1 2 2 2 #2: 2 1 5 5 5 #3: 3 1 6 6 6 #4: 1 2 4 4 4 #5: 2 2 6 6 6 #6: 3 2 7 7 7 #7: 1 3 1 1 1 #8: 2 3 2 2 2 #9: 3 3 3 3 3

names

或者如评论中提到的@StevenBeaupré，如果有很多模式，一个选项是在将子字符串提取为patterns参数后使用数据集的melt(setDT(sample.df), measure = patterns(as.list(unique(sub("\\d+", "", names(sample.df)[-1])))),value.name = c("BlueTime", "RedTime", "GreenTime"), variable.name = "time")

select a.UserId, a.Comment from Comments a
left join Block b
on a.UserId = b.BlockedId
and b.BlockerId = <your user here>
where b.BlockerId is null;

Answer 2

如果您的目标是将三种颜色转换为long，则可以使用基本R reshape函数来完成：

reshape(sample.df, idvar="subject", varying=2:length(sample.df), sep="", direction="long")
    Subject time BlueTime RedTime GreenTime subject
1.1       1    1        2       2         2       1
2.1       2    1        5       5         5       2
3.1       3    1        6       6         6       3
1.2       1    2        4       4         4       1
2.2       2    2        6       6         6       2
3.2       3    2        7       7         7       3
1.3       1    3        1       1         1       1
2.3       2    3        2       2         2       2
3.3       3    3        3       3         3       3

时间变量捕获宽变量名称中的1,2,3。变量参数告诉reshape哪些变量应该转换为long。 sep参数告诉reshape在变量变量末尾查找未被任何字符分隔的数字，而direction参数告诉函数尝试长时间转换。

我总是添加id变量，即使以后没有必要参考。

如果你的data.frame实际上没有时间变量的数字，一个相当简单的解决方案就是更改变量名称。例如，以下内容将取代＆＃34; _Pre＆＃34;用＆＃34; 1＆＃34;在任何这样的变量的最后。

names(df)[grep("_Pre$", names(df))] <- gsub("_Pre$", "1",
                                            names(df)[grep("_Pre$", names(df))])

Answer 3

这里的想法是gather()所有时间变量（除了Subject之外的所有变量），在separate()上使用key将它们拆分为label和time然后spread() label和value以获得所需的输出。

library(dplyr)
library(tidyr)

sample.df %>%
  gather(key, value, -Subject) %>%
  separate(key, into = c("label", "time"), "(?<=[a-z])(?=[0-9])") %>%
  spread(label, value)

给出了：

#  Subject time BlueTime GreenTime RedTime
#1       1    1        2         2       2
#2       1    2        4         4       4
#3       1    3        1         1       1
#4       2    1        5         5       5
#5       2    2        6         6       6
#6       2    3        2         2       2
#7       3    1        6         6       6
#8       3    2        7         7       7
#9       3    3        3         3       3

注意

在这里，我们使用@ answer regex中的separate() @ ColorTime_Pre来分割第一个遇到的数字。

修改

我从您的评论中了解到，您的数据集列名实际上采用ColorTime_Post，ColorTime_Final，separate()格式。如果是这种情况，则您不必在sep = "[^[:alnum:]]+"中指定正则表达式，因为默认值_将与label匹配，并将密钥拆分为time并相应地sample.df %>% gather(key, value, -Subject) %>% separate(key, into = c("label", "time")) %>% spread(label, value)：

# Subject time BlueTime GreenTime RedTime #1 1 Final 1 1 1 #2 1 Post 4 4 4 #3 1 Pre 2 2 2 #4 2 Final 2 2 2 #5 2 Post 6 6 6 #6 2 Pre 5 5 5 #7 3 Final 3 3 3 #8 3 Post 7 7 7 #9 3 Pre 6 6 6

会给：

menhir

R：使用多个3个时间段变量将宽格式转换为长格式

3 个答案: