在小标题中将数据从右列移动到左列

时间:2018-07-10 07:44:47

标签: r dplyr stringr

我有一个关于诊断信息的小标题:

data <- tibble(
  id = c(1:10),
  diagnosis_1 = c("F32", "F431", "R58", "S32", "F11", NA, NA, "Y67", "F32", "Z032"),
  diagnosis_2 = c(NA, NA, NA, NA, NA, NA, "G35", NA, NA, NA),
  diagnosis_3 = c("F40", NA, "R67", "F431", NA, "F60", "S58", "R68", "F11", NA),
  diagnosis_4 = c(NA, NA, "F65", NA, "F19", NA, NA, "F32", NA, NA)
)

作为清洁过程的一部分,我删除了所有不满足特定条件的诊断(即不是以字母F,G或Z开头)。使用以下代码:

data$diagnosis_1[str_sub(data$diagnosis_1, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_2[str_sub(data$diagnosis_2, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_3[str_sub(data$diagnosis_3, 1,1) %in% c("R", "S", "Y")] <- NA
data$diagnosis_4[str_sub(data$diagnosis_4, 1,1) %in% c("R", "S", "Y")] <- NA

以这个小标题告终:

enter image description here

我现在需要将数据向左移动以从左到右填充列(即,如果诊断_2,诊断_3或诊断_4有数据,诊断_1不能为空)。我尝试使用ifelse()进行矢量化处理,但似乎无法使其与多个嵌套的ifelse()一起使用。

ifelse(is.na(data$diagnosis_1), data$diagnosis_2, data$diagnosis_1))

非常感谢所有建议。

编辑:添加预期输出:

enter image description here

4 个答案:

答案 0 :(得分:2)

您可以将Reduce中的coalescedplyr一起使用,即

df$diagnosis_1 <- Reduce(dplyr::coalesce, df[-1])

#id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
#   <int> <chr>       <chr>       <chr>       <chr>      
# 1     1 F32         <NA>        F40         <NA>       
# 2     2 F431        <NA>        <NA>        <NA>       
# 3     3 F65         <NA>        <NA>        F65        
# 4     4 F431        <NA>        F431        <NA>       
# 5     5 F11         <NA>        <NA>        F19        
# 6     6 F60         <NA>        F60         <NA>       
# 7     7 G35         G35         <NA>        <NA>       
# 8     8 F32         <NA>        <NA>        F32        
# 9     9 F32         <NA>        F11         <NA>       
#10    10 Z032        <NA>        <NA>        <NA> 

答案 1 :(得分:2)

我们首先将以“ R”,“ S”或“ Y”开头的replaceNA,然后将非NA值左移。

data[-1] <- lapply(data[-1], function(x) replace(x, grepl("^[R|S|Y]", x), NA))  
data[] <- t(apply(data, 1, function(x) `length<-`(na.omit(x), length(x))))

data
# A tibble: 10 x 5
#     id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
#   <chr> <chr>       <chr>       <chr>       <chr>      
# 1 " 1"  F32         F40         NA          NA         
# 2 " 2"  F431        NA          NA          NA         
# 3 " 3"  F65         NA          NA          NA         
# 4 " 4"  F431        NA          NA          NA         
# 5 " 5"  F11         F19         NA          NA         
# 6 " 6"  F60         NA          NA          NA         
# 7 " 7"  G35         NA          NA          NA         
# 8 " 8"  F32         NA          NA          NA         
# 9 " 9"  F32         F11         NA          NA         
#10  10   Z032        NA          NA          NA    

将非NA值左移是从here的David答案中得出的。您也可以尝试其他任何方法来转移来自同一问题的值。

答案 2 :(得分:2)

您可以尝试tidyverse

library(tidyverse)
data %>% 
  mutate_at(vars(starts_with("diagnosis")), funs(ifelse(str_sub(., 1, 1) %in% c("R", "S", "Y"), NA, .))) %>% 
  gather(k,v, -id) %>% 
  group_by(id) %>% 
  arrange(id) %>% 
  mutate(v=ifelse(k == "diagnosis_1", v[!is.na(v)][1], v)) %>% 
  spread(k, v)
# A tibble: 10 x 5
# Groups:   id [10]
      id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
   <int> <chr>       <chr>       <chr>       <chr>      
 1     1 F32         NA          F40         NA         
 2     2 F431        NA          NA          NA         
 3     3 F65         NA          NA          F65        
 4     4 F431        NA          F431        NA         
 5     5 F11         NA          NA          F19        
 6     6 F60         NA          F60         NA         
 7     7 G35         G35         NA          NA         
 8     8 F32         NA          NA          F32        
 9     9 F32         NA          F11         NA         
10    10 Z032        NA          NA          NA 

由于尚不清楚OP想要什么(请参见下面的讨论),您也可以尝试

data %>% 
  mutate_at(vars(starts_with("diagnosis")), funs(ifelse(str_sub(., 1, 1) %in% c("R", "S", "Y"), NA, .))) %>% 
  gather(k,v, -id) %>% 
  group_by(id) %>% 
  arrange(id) %>% 
  mutate(v=c(v[!is.na(v)], rep(NA, length(v) - length(v[!is.na(v)])))) %>% 
  spread(k, v)
# A tibble: 10 x 5
# Groups:   id [10]
      id diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
   <int> <chr>       <chr>       <chr>       <chr>      
 1     1 F32         F40         NA          NA         
 2     2 F431        NA          NA          NA         
 3     3 F65         NA          NA          NA         
 4     4 F431        NA          NA          NA         
 5     5 F11         F19         NA          NA         
 6     6 F60         NA          NA          NA         
 7     7 G35         NA          NA          NA         
 8     8 F32         NA          NA          NA         
 9     9 F32         F11         NA          NA         
10    10 Z032        NA          NA          NA

答案 3 :(得分:1)

使用 dplyr tidyr 。从宽变长,排除"^RSY"NA诊断,从长变宽。

library(dplyr)
library(tidyr)

gather(data, key = "k", value = "v", -id) %>% 
  filter(!(grepl("^[R|S|Y]", v) | is.na(v))) %>% 
  group_by(id) %>% 
  mutate(diagN = paste0("diagnosis_", row_number())) %>% 
  select(-k) %>% 
  spread(key = "diagN", value = "v") %>% 
  ungroup()

# # A tibble: 10 x 3
#       id diagnosis_1 diagnosis_2
#    <int> <chr>       <chr>      
#  1     1 F32         F40        
#  2     2 F431        NA         
#  3     3 F65         NA         
#  4     4 F431        NA         
#  5     5 F11         F19        
#  6     6 F60         NA         
#  7     7 G35         NA         
#  8     8 F32         NA         
#  9     9 F32         F11        
# 10    10 Z032        NA