我有一个数据框,如果列中的值从" A"到" B"。
Event Price Type Date Time
A 100 Sell 27-01-2018 12:00
C 200 Buy 27-01-2018 12:15
C 300 Buy 27-01-2018 12:30
D 350 Sell 27-01-2018 12:31
A 320 Buy 27-01-2018 12:32
B 321 Sell 27-01-2018 12:32
B 220 Buy 27-01-2018 12:34
L 550 Buy 27-01-2018 12:35
A 320 Buy 27-01-2018 12:32
B 320 Sell 27-01-2018 12:32
如果事件" B",跟随事件" A",我想插入一个新行。需要在两行之间插入新行,其中所有值都等于" B"是事件,除了事件将是" Z"。
预期数据框
Event Price Type Date Time
A 100 Sell 27-01-2018 12:00
C 200 Buy 27-01-2018 12:15
C 300 Buy 27-01-2018 12:30
D 350 Sell 27-01-2018 12:31
A 320 Buy 27-01-2018 12:32
Z 321 Sell 27-01-2018 12:32
B 321 Sell 27-01-2018 12:32
B 220 Buy 27-01-2018 12:34
L 550 Buy 27-01-2018 12:35
A 320 Buy 27-01-2018 12:32
Z 320 Sell 27-01-2018 12:32
B 320 Sell 27-01-2018 12:32
答案 0 :(得分:4)
以下是使用tidyverse的方法:
library(tidyverse)
df %>%
mutate(lagE = lag(Event), #create a lag Even column
splt = ifelse(Event == "B" & lagE == "A", T, F), #label the condition B after A
cum = cumsum(splt)) %>% #create a column to split by
{split(., .$cum)} %>% #split the data frame
map(function(x){ #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
if(x[1,1] == "B"){
z <- rbind(x[1,], x)
z[,1] <- as.character(z[,1])
z[1,1] <- "Z"
} else {z <- x}
z
}) %>%
bind_rows() %>% #put back to a data frame
select(1:5) #remove helper columns
#output
Event Price Type Date Time
1 A 100 Sell 27-01-2018 12:00
2 C 200 Buy 27-01-2018 12:15
3 C 300 Buy 27-01-2018 12:30
4 D 350 Sell 27-01-2018 12:31
5 A 320 Buy 27-01-2018 12:32
6 Z 321 Sell 27-01-2018 12:32
7 B 321 Sell 27-01-2018 12:32
8 B 220 Buy 27-01-2018 12:34
9 L 550 Buy 27-01-2018 12:35
10 A 320 Buy 27-01-2018 12:32
11 Z 320 Sell 27-01-2018 12:32
12 B 320 Sell 27-01-2018 12:32
问题似乎很简单,我相信有人会提供更简洁的解决方案。
答案 1 :(得分:4)
以下是使用base R
的选项。我们通过将下一个“事件”与当前“事件”进行比较来创建逻辑vector
,并检查它是否等于“A”和“B”。然后,使用索引rbind
使用原始数据集对数据集进行子集化,然后根据索引“i2”将“事件”更改为“Z”
i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
i2 <- which(i1) + seq_along(which(i1))-1
n <- sum(i1)+ length(i1)
res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
row.names(res) <- NULL
res
# Event Price Type Date Time
#1 A 100 Sell 27-01-2018 12:00
#2 C 200 Buy 27-01-2018 12:15
#3 C 300 Buy 27-01-2018 12:30
#4 D 350 Sell 27-01-2018 12:31
#5 A 320 Buy 27-01-2018 12:32
#6 Z 321 Sell 27-01-2018 12:32
#7 B 321 Sell 27-01-2018 12:32
#8 B 220 Buy 27-01-2018 12:34
#9 L 550 Buy 27-01-2018 12:35
#10 A 320 Buy 27-01-2018 12:32
#11 Z 320 Sell 27-01-2018 12:32
#12 B 320 Sell 27-01-2018 12:32
答案 2 :(得分:2)
替代tidyverse
方法
library(tidyverse)
df %>%
group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
ungroup() %>%
slice(-1) %>%
select(-G)
# A tibble: 12 x 5
# Event Price Type Date Time
# <chr> <int> <chr> <chr> <chr>
# 1 A 100 Sell 27-01-2018 12:00
# 2 C 200 Buy 27-01-2018 12:15
# 3 C 300 Buy 27-01-2018 12:30
# 4 D 350 Sell 27-01-2018 12:31
# 5 A 320 Buy 27-01-2018 12:32
# 6 Z 321 Sell 27-01-2018 12:32
# 7 B 321 Sell 27-01-2018 12:32
# 8 B 220 Buy 27-01-2018 12:34
# 9 L 550 Buy 27-01-2018 12:35
# 10 A 320 Buy 27-01-2018 12:32
# 11 Z 320 Sell 27-01-2018 12:32
# 12 B 320 Sell 27-01-2018 12:32
数据
df <- read.table(text="Event Price Type Date Time
A 100 Sell 27-01-2018 12:00
C 200 Buy 27-01-2018 12:15
C 300 Buy 27-01-2018 12:30
D 350 Sell 27-01-2018 12:31
A 320 Buy 27-01-2018 12:32
B 321 Sell 27-01-2018 12:32
B 220 Buy 27-01-2018 12:34
L 550 Buy 27-01-2018 12:35
A 320 Buy 27-01-2018 12:32
B 320 Sell 27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)