重整数据:基于列名的新变量

时间:2020-06-19 09:19:36

标签: r

我想将数据集从宽格式重整为长格式。

数据集包含300个东西变量,每个变量均按以下原则命名:ModelID_Emotion_ModelGender。以下示例数据:

structure(list(X71_Anger_Male = structure(c(3L, 1L, 2L), .Label = c("Anger", 
"Disgust", "Fear"), class = "factor"), X71_Disgus_Male = structure(c(2L, 
1L, 1L), .Label = c("Disgust", "Fear"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

我想以一种方式处理数据,以使列名中的信息被获取并放入新变量中。例如,应该有一个新的变量ModelGender,新的变量modelID和新的变量情绪。因此数据集应如下所示:

structure(list(Gender = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Male", class = "factor"), 
    ModelNumber = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "X71", class = "factor"), 
    Emotion = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Anger", 
    "Disgust"), class = "factor"), Response = structure(c(3L, 
    2L, 2L, 3L, 1L, 2L), .Label = c("Anger", "Disgust", "Fear"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

当我使用重塑形状或聚集/展开或熔化/浇铸时,它无法提供所需的结果。有谁知道如何做到这一点?

谢谢您的时间!

2 个答案:

答案 0 :(得分:1)

您可以简单地转换为long并拆分所需的列。 tidyverse方法的一种方法可以是

library(dplyr)
library(tidyr)

df %>% 
 pivot_longer(everything()) %>% 
 separate(name, into = c('ModelNumber', 'Emotion', 'Gender'), sep = '_')

答案 1 :(得分:1)

pivot_longer中,您可以将names_sep指定为"_"并将列名分为3列。

tidyr::pivot_longer(df, cols = everything(),
                        names_to = c('ModelNumber', 'Emotion', 'Gender'), 
                        values_to = 'Response',
                        names_sep = '_')

# A tibble: 6 x 4
#  ModelNumber Emotion Gender Response
#  <chr>       <chr>   <chr>  <fct>   
#1 X71         Anger   Male   Fear    
#2 X71         Disgus  Male   Fear    
#3 X71         Anger   Male   Anger   
#4 X71         Disgus  Male   Disgust 
#5 X71         Anger   Male   Disgust 
#6 X71         Disgus  Male   Disgust