根据R中的条件创建重复行

时间:2015-03-10 10:35:53

标签: r duplicates conditional data.table

我有一个看起来像这样的data.table

dt <- data.table(ID=c("A","A","B","B"),Amount1=c(100,200,300,400),
                 Amount2=c(1500,1500,2400,2400),Dupl=c(1,0,1,0))

   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A     200    1500    0
3:  B     300    2400    1
4:  B     400    2400    0

我需要复制Dupl列中包含1的每一行,并将Amount1值替换为该重复行中的Amount2值。除此之外,我需要在Dupl中为重复行提供值2。这意味着它应该如下所示:

   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A    1500    1500    2
3:  A     200    1500    0
4:  B     300    2400    1
5:  B    2400    2400    2
6:  B     400    2400    0

非常感谢任何帮助! 亲切的问候,

6 个答案:

答案 0 :(得分:9)

你可以尝试

rbind(dt,dt[Dupl==1][,c('Amount1', 'Dupl') := list(Amount2, 2)])

答案 1 :(得分:6)

使用dplyr

require("data.table")
require("dplyr")

#data
dt <- data.table(ID=c("A","A","B","B"),Amount1=c(100,200,300,400),
                 Amount2=c(1500,1500,2400,2400),Dupl=c(1,0,1,0))
#result
rbind(dt,
      dt %>% 
        filter(Dupl==1) %>% 
        mutate(Dupl=2,
               Amount1=Amount2))

#    ID Amount1 Amount2 Dupl
# 1:  A     100    1500    1
# 2:  A     200    1500    0
# 3:  B     300    2400    1
# 4:  B     400    2400    0
# 5:  A    1500    1500    2
# 6:  B    2400    2400    2

答案 2 :(得分:4)

您可以rbind完成正确转换后的子设置数据副本:

rbind(dt,copy(dt[Dupl==1])[,Amount1:=Amount2][,Dupl:=Dupl+1])
   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A     200    1500    0
3:  B     300    2400    1
4:  B     400    2400    0
5:  A    1500    1500    2
6:  B    2400    2400    2

或者,您可以通过子设置获取重复项,然后使用中间步骤转换重复的行。这样可以将重复的行保留在原始文本旁边,如问题中的示例所示:

x <- dt[rep(seq(dt[,Dupl]),times=dt[,Dupl==1]+1)]
x[duplicated(x),c("Amount1","Dupl"):=list(Amount2,Dupl+1)]
x
   ID Amount1 Amount2 Dupl
1:  A     100    1500    1
2:  A    1500    1500    2
3:  A     200    1500    0
4:  B     300    2400    1
5:  B    2400    2400    2
6:  B     400    2400    0

答案 3 :(得分:3)

这似乎符合你的要求。可能有点精炼......

library(splitstackshape)
expandRows(dt, dt$Dupl+1, count.is.col = FALSE)[
  Dupl != 0, Dupl := cumsum(Dupl), by = ID][
    , Amount1 := ifelse(Dupl > 1, Amount2[-1], Amount1)][]
#    ID Amount1 Amount2 Dupl
# 1:  A     100    1500    1
# 2:  A    1500    1500    2
# 3:  A     200    1500    0
# 4:  B     300    2400    1
# 5:  B    2400    2400    2
# 6:  B     400    2400    0

答案 4 :(得分:0)

使用dplyr的left_join进行复制。也许不是优雅,但应该很容易理解。

library(data.table)
library(dplyr)

joiner <- data.frame(Dupl = 1, helper_col= 1:2)

dt <- left_join(dt, joiner) %>%
  mutate(Dupl = ifelse(helper_col == 2 & !is.na(helper_col), 2, Dupl)) %>%
  select(-helper_col) %>%
  mutate(Amount1 = ifelse(Dupl == 2, Amount2, Amount1))

> dt
  ID Amount1 Amount2 Dupl
1  A     100    1500    1
2  A    1500    1500    2
3  A     200    1500    0
4  B     300    2400    1
5  B    2400    2400    2
6  B     400    2400    0

答案 5 :(得分:0)

基于此处,但我认为这种dplyr解决方案非常优雅,并且还具有很好的可扩展性,特别是只要Dupl始终<=2。本质上,它利用了tidyr::uncount的优势, ,“根据给定列的值(x),每行重复x次,从而延长df。”延长df后,如果它们与滞后值相同,就可以使用dplyr::mutate_at来替换它们。

library(tidyverse)
dt %>%
    uncount(Dupl + 1) %>%
    mutate_at(vars(Amount1),
              ~case_when(. == lag(.) ~ Amount2, TRUE ~.)) %>%
    mutate_at(vars(Dupl),
              ~case_when(. == lag(.) ~ 2, TRUE ~.))

#    ID Amount1 Amount2 Dupl
# 1:  A     100    1500    1
# 2:  A    1500    1500    2
# 3:  A     200    1500    0
# 4:  B     300    2400    1
# 5:  B    2400    2400    2
# 6:  B     400    2400    0