如何通过名称提取一些正则表达式的最小条目?

时间:2016-07-26 13:00:33

标签: regex r

我想通过Name提取一些正则表达式的最小条目。

这里有一些数据:

# Here I define the dates:
dates <- as.Date(as.character(c("2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17",
                           "2011-01-13",
                           "2011-01-14",
                           "2011-01-15",
                           "2011-01-16",
                           "2011-01-17")))
# Here I define the Names
Name <-c("Andy","Andy","Andy","Andy","Andy","Jo","Jo","Jo","Jo","Jo","Me","Me","Me","Me",'Me')
# Here I define the status character
status<- c("ID: 10 -> 1","ID: 11 -> 0","ID: 3 -> 5","ID: 20 -> 4","ID: 1 -> 5","ID: 1 -> 1","ID: 3 -> 2","ID: 20 -> 5","ID: 10 -> 5","ID: 11 -> 5","ID: 12 ->1","ID: 30 -> 2","ID: 30 -> 5","ID: 30 -> 2","ID: 30 -> 5")

# put together
data <- data.frame(Name, dates, status)
# Here the output with the desired column TRUE which is true for the
# first change of ID from something to 5
  Name      dates      status     condition_met
1  Andy 2011-01-13 ID: 10 -> 1     0
2  Andy 2011-01-14 ID: 11 -> 0     0
3  Andy 2011-01-15 ID: 3 -> 5      1
4  Andy 2011-01-16 ID: 20 -> 4     0
5  Andy 2011-01-17 ID: 1 -> 5      0
6    Jo 2011-01-13 ID: 1 -> 1      0
7    Jo 2011-01-14 ID: 3 -> 2      0
8    Jo 2011-01-15 ID: 20 -> 5     1
9    Jo 2011-01-16 ID: 10 -> 5     0
10   Jo 2011-01-17 ID: 11 -> 5     0
11   Me 2011-01-13 ID: 12 -> 1     0
12   Me 2011-01-14 ID: 30 -> 2     0
13   Me 2011-01-15 ID: 30 -> 5     1
14   Me 2011-01-16 ID: 30 -> 2     0
15   Me 2011-01-17 ID: 30 -> 5     0

我试图提取:

data$condition_met <- ifelse(grepl("-> 5",data$status),1,0)

这会产生一个带有condition_met的表但是对于所有&#34; - &gt; 5&#34;而不是最小的又名第一&#34; - &gt; 5&#34;不幸的是,名字。

1 个答案:

答案 0 :(得分:3)

我们可以创建一个指示条件第一次匹配的函数。然后使用base Rdplyrdata.table按组调用它:

condition <- function(x) as.integer(1:length(x) == grep("-> 5", x, fixed = TRUE)[1])

#base
data$condition_met <- as.integer(with(data, ave(status, Name, FUN=condition)))

#data.table
library(data.table)
setDT(data)[, condition_met := condition(status), by = Name]

#dplyr
library(dplyr)
data %>% group_by(Name) %>% mutate(condition_met = condition(status))