根据因子值从两组现有列中创建新的列集

时间:2018-05-21 15:57:26

标签: r

我的变量名称为“VA01_01”,“VA01_02”等,“VA02_01”,“VA02_02”。具有前缀VA01的那些变量是来自女性参与者的数据,具有前缀VA02的变量来自男性参与者。例如,男性参与者在变量VA01中具有NA。我已经有了一个关于性的价值因素。

我想要做的是创建一组新的变量来接管两种变量类型的值。也就是说,如果它是男性参与者,他将获得该组变量中VA02变量的值。因此,新的变量集将不再具有任何NAs,因为它不会基于性别。

有没有人有这个问题的简单解决方案?我不知道重塑是否是答案,因为我真的不想将我的数据帧转换为长格式。

以下是开头的样子:

 structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female", 
 "male"), class = "factor"), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4, 
 NA, 4, NA), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA, 
 3)), .Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
 ), row.names = c(NA, -4L), class = "data.frame")

最后(我想保留原始变量):

structure(list(sex = structure(c(1L, 2L, 1L, 2L), .Label = c("female", 
"male"), class = "factor"), VA_tot_01 = c(1, 3, 2, 4), VA_tot_02 = c(4, 
5, 4, 3), VA01_01 = c(1, NA, 2, NA), VA01_02 = c(4, NA, 4, NA
), VA02_01 = c(NA, 3, NA, 4), VA02_02 = c(NA, 5, NA, 3)), .Names = c("sex", 
"VA_tot_01", "VA_tot_02", "VA01_01", "VA01_02", "VA02_01", "VA02_02"
), row.names = c(NA, -4L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

考虑到VAR01和VAR02不重叠,您可以简单地创建另一个变量VAR_tot_xx,包括两者的原始值。它会是这样的:

new_vars <- function(df) {
  vars <- unique(gsub(
    pattern = ".*_", 
    replacement = "_", 
    x = grep(
      pattern = "_[0-9]{2}$", 
      x = names(df), 
      value = TRUE
    )
  ))
  for (i in vars) {
    new_name <- paste0("VA_tot", i)
    male_name <- paste0("VA01", i)
    female_name <- paste0("VA02", i)
    df[[new_name]] <- NA
    df[[new_name]][!is.na(df[[female_name]])] <- 
      df[[female_name]][!is.na(df[[female_name]])]
    df[[new_name]][!is.na(df[[male_name]])] <- 
      df[[male_name]][!is.na(df[[male_name]])]
  }
  return(df)
}

它可能比这更漂亮,但这可以胜任。

c <- structure(
  list(
    sex = structure(
      c(1L, 2L, 1L, 2L),
      .Label = c("female", "male"),
      class = "factor"
    ),
    VA01_01 = c(1, NA, 2, NA),
    VA01_02 = c(4, NA, 4, NA),
    VA02_01 = c(NA, 3, NA, 4),
    VA02_02 = c(NA, 5, NA, 3)
  ),
  .Names = c("sex", "VA01_01", "VA01_02", "VA02_01", "VA02_02"),
  row.names = c(NA, -4L),
  class = "data.frame"
)
new_vars(c)

#       sex VA01_01 VA01_02 VA02_01 VA02_02 VA_tot_01 VA_tot_02
# 1 female        1       4      NA      NA         1         4
# 2   male       NA      NA       3       5         3         5
# 3 female        2       4      NA      NA         2         4
# 4   male       NA      NA       4       3         4         3