我有一个字符串向量,如下所示。我想提取日期。
background-size:100% 100%
在提取日期之前,我想创建一个Date_Flag。我使用了以下代码,但它提供了不同的输出:
check_values <- c("deficit based on wage statement 7/14/ to 7/17/2015",
"Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2",
"Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15;",
"Deficit due to wage statement: 4/22/15 thru 5/12/15",
"depos transcript 7/10/15 for 7/8/15 depos",
"difference owed for 4/25/15-5/22/15",
"tpd 4:30:2015 - 5:22:2015",
"Medical TREATMENT DATES: 6/30/2015 - 6/30/2015",
"4/25/15-5/22/15",
"Medical")
check_values <- as.data.table(check_values)
names(check_values) <- "check_memo"
在创建Date_Flag之后,我想提取日期(两个部分)。有人可以告诉我上面的常规回归有什么问题吗?
由于
答案 0 :(得分:3)
我们可以使用str_count
来创建&#39; Date_Flag&#39;假设在&check; meme&#39;的每个元素中有2个完整日期,我们得到TRUE,否则为FALSE。
library(data.table)
library(stringr)
pat <- "[0-9]{1,2}[/:][0-9]{1,2}[/:][0-9]{2,4}"
check_values[,Date_Flag := str_count(check_memo, pat)==2]
check_values
# check_memo Date_Flag
#1: deficit based on wage statement 7/14/ to 7/17/2015 FALSE
#2: Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2 FALSE
#3: Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15; TRUE
#4: Deficit due to wage statement: 4/22/15 thru 5/12/15 TRUE
#5: depos transcript 7/10/15 for 7/8/15 depos TRUE
#6: difference owed for 4/25/15-5/22/15 TRUE
#7: tpd 4:30:2015 - 5:22:2015 TRUE
#8: Medical TREATMENT DATES: 6/30/2015 - 6/30/2015 TRUE
#9: 4/25/15-5/22/15 TRUE
#10: Medical FALSE
如果我们需要提取日期,请使用与str_extract_all
check_values[(Date_Flag), paste0("Date", 1:2) :=
transpose(str_extract_all(check_memo, pat))]
check_values
check_memo #Date_Flag Date1 Date2
# 1: deficit based on wage statement 7/14/ to 7/17/2015 FALSE NA NA
# 2: Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2 FALSE NA NA
# 3: Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15; TRUE 7/14/15 10/5/15
# 4: Deficit due to wage statement: 4/22/15 thru 5/12/15 TRUE 4/22/15 5/12/15
# 5: depos transcript 7/10/15 for 7/8/15 depos TRUE 7/10/15 7/8/15
# 6: difference owed for 4/25/15-5/22/15 TRUE 4/25/15 5/22/15
# 7: tpd 4:30:2015 - 5:22:2015 TRUE 4:30:2015 5:22:2015
# 8: Medical TREATMENT DATES: 6/30/2015 - 6/30/2015 TRUE 6/30/2015 6/30/2015
# 9: 4/25/15-5/22/15 TRUE 4/25/15 5/22/15
#10: Medical FALSE NA NA