Question

有没有办法按以下格式重新整形数据？

    Date       Student Test.1 Test.2 Test.3 
    2007/02/01   A      80      90     70  
    2007/02/01   B      90      60     90  
    2007/02/01   C      75      70     80  
    2007/02/01   D      50      80     70

采用以下格式？

    Date       Student Result  Test 
    2007/02/01   A      80       1   
    2007/02/01   A      90       2   
    2007/02/01   A      70       3   
    2007/02/01   B      90       1   
    2007/02/01   B      60       2   
    2007/02/01   B      90       3   
    2007/02/01   C      75       1   
    2007/02/01   C      70       2   
    2007/02/01   C      80       3   
    2007/02/01   D      50       1   
    2007/02/01   D      80       2
    2007/02/01   D      70       3

Answer 1

这样的事情会：

reshape(x, direction='long', varying=paste('Test', 1:3, sep='.'))
          Date Student time Test id
1.1 2007/02/01       A    1   80  1
2.1 2007/02/01       B    1   90  2
3.1 2007/02/01       C    1   75  3
4.1 2007/02/01       D    1   50  4
1.2 2007/02/01       A    2   90  1
2.2 2007/02/01       B    2   60  2
3.2 2007/02/01       C    2   70  3
4.2 2007/02/01       D    2   80  4
1.3 2007/02/01       A    3   70  1
2.3 2007/02/01       B    3   90  2
3.3 2007/02/01       C    3   80  3
4.3 2007/02/01       D    3   70  4

然后，您可以根据需要重命名列。请注意，此处的time列是您在所需输出中标记为Test的内容。这就是宽格式的列以长格式区分的方式。

Answer 2

melt()功能可能会有所帮助：

library(reshape)
md <- melt(df, id=c('Date','Student')

生成的“融化”数据框将是这样的：

      Date Student variable value
2007/02/01       A    Test.1   80
2007/02/01       B    Test.1   90
2007/02/01       C    Test.1   75
2007/02/01       D    Test.1   50
2007/02/01       A    Test.1   90
...

然后，您可以重命名列和/或修改值以满足您的需求。

然后，融合的数据框可以与cast()函数一起使用，以创建类似数据透视的数据框。检查the Quick-R tutorial: Reshaping data。

Answer 3

要完成常用方法的综述，您可以查看“dplyr”+“tidyr”，它们可以像这样使用：

library(dplyr)
library(tidyr)

mydf %>% gather(Time, Score, starts_with("Test"))
#          Date Student   Time Score
# 1  2007/02/01       A Test.1    80
# 2  2007/02/01       B Test.1    90
# 3  2007/02/01       C Test.1    75
# 4  2007/02/01       D Test.1    50
# 5  2007/02/01       A Test.2    90
# 6  2007/02/01       B Test.2    60
# 7  2007/02/01       C Test.2    70
# 8  2007/02/01       D Test.2    80
# 9  2007/02/01       A Test.3    70
# 10 2007/02/01       B Test.3    90
# 11 2007/02/01       C Test.3    80
# 12 2007/02/01       D Test.3    70

要获得您正在寻找的具体表单，您可以通过separate和select进一步采取措施：

mydf %>% 
  gather(Time, Score, starts_with("Test")) %>% 
  separate(Time, c("Stub", "Test")) %>% 
  select(-Stub)
#          Date Student Test Score
# 1  2007/02/01       A    1    80
# 2  2007/02/01       B    1    90
# 3  2007/02/01       C    1    75
# 4  2007/02/01       D    1    50
# 5  2007/02/01       A    2    90
# 6  2007/02/01       B    2    60
# 7  2007/02/01       C    2    70
# 8  2007/02/01       D    2    80
# 9  2007/02/01       A    3    70
# 10 2007/02/01       B    3    90
# 11 2007/02/01       C    3    80
# 12 2007/02/01       D    3    70

重塑R中数据框中的数据

3 个答案: