条件基于R中列中的连续值插入行

时间:2018-02-24 13:28:54

标签: r dataframe dplyr

我有一个数据框,如果列中的值从" A"到" B"。

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

如果事件" B",跟随事件" A",我想插入一个新行。需要在两行之间插入新行,其中所有值都等于" B"是事件,除了事件将是" Z"。

预期数据框

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

3 个答案:

答案 0 :(得分:4)

以下是使用tidyverse的方法:

library(tidyverse)
df %>%
  mutate(lagE = lag(Event),  #create a lag Even column
         splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
         cum = cumsum(splt)) %>% #create a column to split by
  {split(., .$cum)} %>% #split the data frame
  map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
    if(x[1,1] == "B"){
      z <- rbind(x[1,], x)
      z[,1] <- as.character(z[,1])
      z[1,1] <- "Z" 
    } else {z <- x}
    z
  }) %>%
  bind_rows() %>% #put back to a data frame
  select(1:5) #remove helper columns

#output
   Event Price Type       Date  Time
1      A   100 Sell 27-01-2018 12:00
2      C   200  Buy 27-01-2018 12:15
3      C   300  Buy 27-01-2018 12:30
4      D   350 Sell 27-01-2018 12:31
5      A   320  Buy 27-01-2018 12:32
6      Z   321 Sell 27-01-2018 12:32
7      B   321 Sell 27-01-2018 12:32
8      B   220  Buy 27-01-2018 12:34
9      L   550  Buy 27-01-2018 12:35
10     A   320  Buy 27-01-2018 12:32
11     Z   320 Sell 27-01-2018 12:32
12     B   320 Sell 27-01-2018 12:32

问题似乎很简单,我相信有人会提供更简洁的解决方案。

答案 1 :(得分:4)

以下是使用base R的选项。我们通过将下一个“事件”与当前“事件”进行比较来创建逻辑vector,并检查它是否等于“A”和“B”。然后,使用索引rbind使用原始数据集对数据集进行子集化,然后根据索引“i2”将“事件”更改为“Z”

i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
i2 <- which(i1) + seq_along(which(i1))-1
n <- sum(i1)+ length(i1)
res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
row.names(res) <- NULL
res
#   Event Price Type       Date  Time
#1      A   100 Sell 27-01-2018 12:00
#2      C   200  Buy 27-01-2018 12:15
#3      C   300  Buy 27-01-2018 12:30
#4      D   350 Sell 27-01-2018 12:31
#5      A   320  Buy 27-01-2018 12:32
#6      Z   321 Sell 27-01-2018 12:32
#7      B   321 Sell 27-01-2018 12:32
#8      B   220  Buy 27-01-2018 12:34
#9      L   550  Buy 27-01-2018 12:35
#10     A   320  Buy 27-01-2018 12:32
#11     Z   320 Sell 27-01-2018 12:32
#12     B   320 Sell 27-01-2018 12:32

答案 2 :(得分:2)

替代tidyverse方法

library(tidyverse)
df %>%
  group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
  do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
  ungroup() %>%
  slice(-1) %>%
  select(-G)

# A tibble: 12 x 5
   # Event Price Type  Date       Time 
   # <chr> <int> <chr> <chr>      <chr>
 # 1 A       100 Sell  27-01-2018 12:00
 # 2 C       200 Buy   27-01-2018 12:15
 # 3 C       300 Buy   27-01-2018 12:30
 # 4 D       350 Sell  27-01-2018 12:31
 # 5 A       320 Buy   27-01-2018 12:32
 # 6 Z       321 Sell  27-01-2018 12:32
 # 7 B       321 Sell  27-01-2018 12:32
 # 8 B       220 Buy   27-01-2018 12:34
 # 9 L       550 Buy   27-01-2018 12:35
# 10 A       320 Buy   27-01-2018 12:32
# 11 Z       320 Sell  27-01-2018 12:32
# 12 B       320 Sell  27-01-2018 12:32

数据

df <- read.table(text="Event   Price   Type    Date    Time
A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)