跨行用NA替换重复项

时间:2019-01-08 09:35:23

标签: r duplicates

我在R中有一个数据帧,如下所示:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA 1223
13 f 1.65 5664 4667 NA 4667
15 m 1.78 6663 NA 6663 NA

对于每一行,我只想在四个坐标之间保留唯一变量。 x 变量,重复项应替换为NA。结果应如下所示:

ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
12 m 1.81 1223 NA NA NA
13 f 1.65 5664 4667 NA NA
15 m 1.78 6663 NA NA NA

关于如何实现这一目标的任何想法?

2 个答案:

答案 0 :(得分:1)

每行使用apply,我们replace的值为duplicatedNA

cols <- grep("^coordinate", names(df))
df[cols] <- t(apply(df[cols], 1, function(x) replace(x, duplicated(x), NA)))

df
#  ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#1 12   m   1.81         1223           NA           NA           NA
#2 13   f   1.65         5664         4667           NA           NA
#3 15   m   1.78         6663           NA           NA           NA

一种tidyverse方法是通过为每一行创建一个row_number()gather所有coordinate...group_by行号(ind ),replaceNAspread的值再次以宽格式重复。

library(tidyverse)

df %>%
  mutate(ind = row_number()) %>%
  gather(key, value, -(c(ind, ID:height))) %>%
  group_by(ind) %>%
  mutate(value = replace(value, duplicated(value), NA)) %>%
  spread(key, value) %>%
  ungroup() %>%
  select(-ind)


#       ID sex   height coordinate.1 coordinate.2 coordinate.3 coordinate.4
#     <int> <fct>  <dbl>        <int>        <int>        <int>        <int>
#1       12 m       1.81         1223           NA           NA           NA
#2       13 f       1.65         5664         4667           NA           NA
#3       15 m       1.78         6663           NA           NA           NA

答案 1 :(得分:1)

避免apply(..., margin = 1, ..)的另一个有趣主意

library(tidyverse)

stack(df[-c(1:3)]) %>% 
 mutate(values = replace(values, duplicated(values), NA)) %>% 
 unstack() %>% 
 bind_cols(df[c(1:3)], .)

给出,

  ID sex height coordinate.1 coordinate.2 coordinate.3 coordinate.4
1 12   m   1.81         1223           NA           NA           NA
2 13   f   1.65         5664         4667           NA           NA
3 15   m   1.78         6663           NA           NA           NA