当另一列值更改时替换列值

时间:2019-03-28 05:43:06

标签: r dataframe

在下面的数据中,我想跟踪UValue列。一旦列Value的值更改为U列中具有相同值的行,我想为U列分配NA

关于如何有效处理此问题的任何建议?

输入数据

data <- read.table(header = TRUE, text="
U   Value   Debug
A   1     1231
A   1     41
A   2     -1149
A   2     -2339
B   3     -3529
B   4     -4719
C   5     -5909
C   5     -7099
C   5     -8289
C   6     -9479
C   6     -10669
C   6     -11859
D   7     -13049
D   7     -14239
D   8     -15429
D   8     -16619")

当前表输出

U   Value   Debug
A   1   1231
A   1   41
A   2   -1149
A   2   -2339
B   3   -3529
B   4   -4719
C   5   -5909
C   5   -7099
C   5   -8289
C   6   -9479
C   6   -10669
C   6   -11859
D   7   -13049
D   7   -14239
D   8   -15429
D   8   -16619

预期的表输出

U   Value   Debug
A   1   1231
A   1   41
NA  2   -1149
NA  2   -2339
B   3   -3529
NA  4   -4719
C   5   -5909
C   5   -7099
C   5   -8289
NA  6   -9479
NA  6   -10669
NA  6   -11859
D   7   -13049
D   7   -14239
NA  8   -15429
NA  8   -16619

3 个答案:

答案 0 :(得分:1)

像这样吗?

data %>%
    group_by(U) %>%
    mutate(
        grp = cumsum(!(lag(Value, default = F) == Value)),
        U.new = ifelse(grp == 1, as.character(U), NA))
## A tibble: 16 x 5
## Groups:   U [4]
#   U     Value  Debug   grp U.new
#   <fct> <int>  <int> <int> <chr>
# 1 A         1   1231     1 A
# 2 A         1     41     1 A
# 3 A         2  -1149     2 NA
# 4 A         2  -2339     2 NA
# 5 B         3  -3529     1 B
# 6 B         4  -4719     2 NA
# 7 C         5  -5909     1 C
# 8 C         5  -7099     1 C
# 9 C         5  -8289     1 C
#10 C         6  -9479     2 NA
#11 C         6 -10669     2 NA
#12 C         6 -11859     2 NA
#13 D         7 -13049     1 D
#14 D         7 -14239     1 D
#15 D         8 -15429     2 NA
#16 D         8 -16619     2 NA

我们正在按U.new分组,因此我在这里创建了一个新列U


根据您的评论,您可以将U替换为U.new

data %>%
    group_by(U) %>%
    mutate(
        grp = cumsum(!(lag(Value, default = F) == Value)),
        U.new = if_else(grp == 1, as.character(U), "NA")) %>%
    ungroup() %>%
    select(U = U.new, Value, Debug)
## A tibble: 16 x 3
#   U     Value  Debug
#   <chr> <int>  <int>
# 1 A         1   1231
# 2 A         1     41
# 3 NA        2  -1149
# 4 NA        2  -2339
# 5 B         3  -3529
# 6 NA        4  -4719
# 7 C         5  -5909
# 8 C         5  -7099
# 9 C         5  -8289
#10 NA        6  -9479
#11 NA        6 -10669
#12 NA        6 -11859
#13 D         7 -13049
#14 D         7 -14239
#15 NA        8 -15429
#16 NA        8 -16619

答案 1 :(得分:1)

我们可以使用data.table。将data.frame转换为data.table的{​​{1}}(setDT(data)),得到U列的游程长度ID(基于值的变化) ,Value-递增值),使用mod运算符(rleid)将其转换为二进制,通过取反(%%)将其转换为逻辑,以使0变为{{1} }和1的!,获取TRUE个值的行索引(FALSE),提取该列(TRUE)并将其用作.I来分配( $V1i的{​​{1}})值

:=

更新

基于与OP的讨论,我们需要为每个'U'分配NA'U',其中'Value'不是U'Value'

NA

library(data.table) setDT(data)[data[, .I[!rleid(Value) %%2], U]$V1, U := NA] data # U Value Debug # 1: A 1 1231 # 2: A 1 41 # 3: <NA> 2 -1149 # 4: <NA> 2 -2339 # 5: B 3 -3529 # 6: <NA> 4 -4719 # 7: C 5 -5909 # 8: C 5 -7099 # 9: C 5 -8289 #10: <NA> 6 -9479 #11: <NA> 6 -10669 #12: <NA> 6 -11859 #13: D 7 -13049 #14: D 7 -14239 #15: <NA> 8 -15429 #16: <NA> 8 -16619

中的相同逻辑
first

答案 2 :(得分:0)

每个组(dplyr)都带有U的另一个选项是找到第一行,其中Value与上一个不同,然后将这些行更改为NA

library(dplyr)

data %>%
  group_by(U) %>%
  mutate(U1 = replace(U, row_number() > which.max(diff(Value) != 0), NA))

#   U     Value  Debug U1   
#   <fct> <int>  <int> <fct>
# 1 A         1   1231 A    
# 2 A         1     41 A    
# 3 A         2  -1149 NA   
# 4 A         2  -2339 NA   
# 5 B         3  -3529 B    
# 6 B         4  -4719 NA   
# 7 C         5  -5909 C    
# 8 C         5  -7099 C    
# 9 C         5  -8289 C    
#10 C         6  -9479 NA   
#11 C         6 -10669 NA   
#12 C         6 -11859 NA   
#13 D         7 -13049 D    
#14 D         7 -14239 D    
#15 D         8 -15429 NA   
#16 D         8 -16619 NA   

如果Value列中可能存在非数字值,我们可以使用lag代替diff

data %>%
  group_by(U) %>%
  mutate(U1 = replace(U, row_number() >= which.max(Value != lag(Value)), NA))