数据框重新设计:在一行中合并2行 - 并按值

时间:2015-09-22 03:21:54

标签: r

我是R.的新手。我做了很多研究和测试,以便找到这个问题的优雅答案。我尝试重塑,t,融化等。我也在努力改变变量的名称。 我坚持使用这样的数据框架。我们有问题的时间(在问题1之前),然后在第二行,我们有时间记录答案。

    Time            Logs
    446.6204    Question1
    452.7516    4
    452.7516    Question2
    458.1999    3
    458.1999    Question3
    460.2342    5

我想将所有内容放在一行上,并使用" Logs"中的值命名变量。运气对我来说,模式是不变的,所以使用切片工作可能会很好。

Respondent TimeQ1   Question1   TimeA1  TimeQ2  Question2   TimeA2  TimeQ3  Question3   TimeA3
Respondent1 446.6204    4   452.7516    452.7516    3   458.1999    458.1999    5   460.2342

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我为受访者添加了一列,并为多个受访者添加了数据。以下是示例数据集:

DF <- structure(list(Respondent = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Respondent 1", 
"Respondent 2", "Respondent 3"), class = "factor"), Time = c(446.6204, 
452.7516, 452.7516, 458.1999, 458.1999, 460.2342, 535.94448, 
543.30192, 543.30192, 549.83988, 549.83988, 552.28104, 443.2204, 
449.3516, 449.3516, 454.7999, 454.7999, 456.8342), Logs = structure(c(6L, 
4L, 7L, 3L, 8L, 5L, 6L, 5L, 7L, 2L, 8L, 3L, 6L, 1L, 7L, 4L, 8L, 
5L), .Label = c("1", "2", "3", "4", "5", "Question1", "Question2", 
"Question3"), class = "factor")), .Names = c("Respondent", "Time", 
"Logs"), row.names = c(NA, -18L), class = "data.frame")

我不认为将数据全部放在一行是您的最佳选择。如果你有很多问题,那么你的排队将会非常长。

这是我之前建议的格式(我仍然认为更好):

 newDF <- data.frame(respondent = DF$Respondent[grep("Question", DF$Logs)],
                question = as.character(DF$Logs[grep("Question", DF$Logs)]),
                questionTime = DF$Time[grep("Question", DF$Logs)],
                responseValue = DF$Logs[-grep("Question", DF$Logs)],
                responseTime = DF$Time[-grep("Question", DF$Logs)])
newDF

 #   respondent  question questionTime responseValue responseTime
 # Respondent 1 Question1     446.6204             4     452.7516
 # Respondent 1 Question2     452.7516             3     458.1999
 # Respondent 1 Question3     458.1999             5     460.2342
 # Respondent 2 Question1     535.9445             5     543.3019
 # Respondent 2 Question2     543.3019             2     549.8399
 # Respondent 2 Question3     549.8399             3     552.2810
 # Respondent 3 Question1     443.2204             1     449.3516
 # Respondent 3 Question2     449.3516             4     454.7999
 # Respondent 3 Question3     454.7999             5     456.8342

修改

根据受访者的附加列,您可以使用dcast之类的内容将上表中的数据放入您正在寻找的内容中。以下是步骤:

 qTime <- dcast(newDF, respondent ~ question, value.var = "questionTime")
names(qTime)[2:length(names(qTime))] <- paste0("TimeQ", seq(1,length(names(qTime))-1,1) )

rValue <- dcast(newDF, respondent ~ question, value.var = "responseValue")

rTime <- dcast(newDF, respondent ~ question, value.var = "responseTime")
names(rTime)[2:length(names(rTime))] <- paste0("TimeA", seq(1,length(names(rTime))-1,1) )

finalDF <- cbind(qTime, rValue[,-1], rTime[,-1])

finalDF

#     respondent   TimeQ1   TimeQ2   TimeQ3 Question1 Question2 Question3   TimeA1   TimeA2   TimeA3
#   Respondent 1 446.6204 452.7516 458.1999         4         3         5 452.7516 458.1999 460.2342
#   Respondent 2 535.9445 543.3019 549.8399         5         2         3 543.3019 549.8399 552.2810
#   Respondent 3 443.2204 449.3516 454.7999         1         4         5 449.3516 454.7999 456.8342

如果你真的想要,你必须摆弄列顺序,但一般情况下应该这样做。