展开给定条件的行

时间:2019-09-17 11:49:42

标签: r duplicates row

如果列具有给定值,我想插入重复的行。我有以下数据集:

dataset <- data.frame(id=c("A","A","A","A","B","B","B","B"),
             date=c('2018-05-09 11:30','2018-10-28 01:15','2018-10-28 01:30','2018-12-08 14:15','2018-05-09 11:30','2018-10-28 01:15','2018-10-28 01:30','2018-12-08 14:15'),
             amount=c(10,20,22,14,12,24,26,10)
             )

    id  date                amount
1   A   2018-05-09 11:30    10
2   A   2018-10-28 01:15    20
3   A   2018-10-28 01:30    22
4   A   2018-12-08 14:15    14
5   B   2018-05-09 11:30    12
6   B   2018-10-28 01:15    24
7   B   2018-10-28 01:30    26
8   B   2018-12-08 14:15    10

我想复制包含给定日期的行,并将其除以2。要查找的日期是:

date_change <- c('2018-10-28 01:00','2018-10-28 01:15','2018-10-28 01:30','2018-10-28 01:45')

我应该得到:

    id  date                amount
1   A   2018-05-09 11:30    10
2   A   2018-10-28 01:15    10
3   A   2018-10-28 01:15    10
4   A   2018-10-28 01:30    11
5   A   2018-10-28 01:30    11
6   A   2018-12-08 14:15    14
7   B   2018-05-09 11:30    12
8   B   2018-10-28 01:15    12
9   B   2018-10-28 01:15    12
10  B   2018-10-28 01:30    13
11  B   2018-10-28 01:30    13
12  B   2018-12-08 14:15    10

我尝试使用expandRows中可用的splitstackshape。但是它只显示复制的行。

library(splitstackshape)
fixed <- expandRows(dataset[dataset$date %in% date_change,], 2, count.is.col = FALSE)

3 个答案:

答案 0 :(得分:4)

base 中,您首先可以找到date_changedate一起击中%in%的地方。将其值除以2,然后用rep复制这些行。

i  <-  dataset$date %in% date_change
within(dataset, amount[i]  <- amount[i]/2)[rep(seq_len(nrow(dataset)), i+1),]
#    id             date amount
#1    A 2018-05-09 11:30     10
#2    A 2018-10-28 01:15     10
#2.1  A 2018-10-28 01:15     10
#3    A 2018-10-28 01:30     11
#3.1  A 2018-10-28 01:30     11
#4    A 2018-12-08 14:15     14
#5    B 2018-05-09 11:30     12
#6    B 2018-10-28 01:15     12
#6.1  B 2018-10-28 01:15     12
#7    B 2018-10-28 01:30     13
#7.1  B 2018-10-28 01:30     13
#8    B 2018-12-08 14:15     10

更改线路时

fixed <- expandRows(dataset[dataset$date %in% date_change,], 2, count.is.col = FALSE)

fixed <- splitstackshape::expandRows(dataset, dataset$date %in% date_change+1, count.is.col = FALSE)

它应该做您想要的。但是amount仍然需要分开。

答案 1 :(得分:3)

这是一个通过from pynput.mouse import Listener count = 0 def on_click(x, y, button, pressed): print("check") global count count += 1 print(count) with Listener(on_click=on_click) as listener: listener.join() 的想法。我们创建一个逻辑变量来指示日期是否在dplyr中,并加1(date_change +1 = 2和TRUE +1 = 1)。然后,我们在两种情况下使用它;首先将金额相除(用1或2),然后除以FALSE(即,将行扩展为新变量所提及的次数)

uncount

给出,

library(dplyr)

dataset %>% 
 mutate(new = date %in% date_change + 1, 
        amount = amount / new) %>% 
 tidyr::uncount(new)

答案 2 :(得分:2)

我们可以filter date中出现的date_change,将amount除以2,然后重复两次行,并将行绑定到date date_change中不存在。

library(dplyr)

dataset %>%
  filter(!date %in% date_change) %>%
  bind_rows(dataset %>%
              filter(date %in% date_change) %>%
              mutate(amount = amount/2) %>%
              slice(rep(seq_len(n()), each = 2))) %>%
   arrange(id)

#   id             date amount
#1   A 2018-05-09 11:30     10
#2   A 2018-12-08 14:15     14
#3   A 2018-10-28 01:15     10
#4   A 2018-10-28 01:15     10
#5   A 2018-10-28 01:30     11
#6   A 2018-10-28 01:30     11
#7   B 2018-05-09 11:30     12
#8   B 2018-12-08 14:15     10
#9   B 2018-10-28 01:15     12
#10  B 2018-10-28 01:15     12
#11  B 2018-10-28 01:30     13
#12  B 2018-10-28 01:30     13