鉴于此data.table
:
library(data.table)
dt <- data.table(f1 = c(
"stuffstuff-0000097125",
"stuffstuff.abc.0006496679",
"stuffstuff0007517235",
"stuffstuff_xyz.0007280719",
"stuffstuff0005995303",
"stuffstuff_a1b_0000143856",
"stuffstuff0009362407",
"stuffstuff.c44_0009735298"
))
我希望获得这些结果:
f1 parsed_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44
以下是我的尝试:
rex_pattern <- "(?<=(\\.|\\_|\\-))[A-Za-z0-9]{3}(?=(\\.|\\_|\\-)[0-9]{3,})"
dt[, `:=`(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]
然而,由于回收利用,这些是我得到的结果:
f1 parsed_val
1: stuffstuff-0000097125 abc
2: stuffstuff.abc.0006496679 xyz
3: stuffstuff0007517235 a1b
4: stuffstuff_xyz.0007280719 c44
5: stuffstuff0005995303 abc
6: stuffstuff_a1b_0000143856 xyz
7: stuffstuff0009362407 a1b
8: stuffstuff.c44_0009735298 c44
我尝试在函数中使用ifelse
来返回空字符串:
getMmFromFilename <- function(my_file_name){
rex_pattern <- "(?<=(\\.|\\_|\\-))[A-Za-z0-9]{3}(?=(\\.|\\_|\\-)[0-9]{3,})"
nothing_found <- character(length = 0)
mm <- regmatches(my_file_name, regexpr(pattern = rex_pattern, my_file_name, perl = TRUE))
ifelse(identical(mm, nothing_found), "missing_Mm", mm)
}
dt[, .(parsed_val = getMmFromFilename(f1))]
但这只返回abc
的1个值。 regmatches
data.table
表示:“对于向量匹配数据(从regexpr获取),将删除空匹配;对于列表匹配数据,空匹配给出空组件(零长度字符向量)。”我猜这个解决方案就在这里,但我还没有得到它......
至于解决方案,我的工作流程要求我使用public async Task<ShortUrl> GetAsync(string code)
{
var filterBuilder = new FilterDefinitionBuilder<ShortUrl>();
var filter = filterBuilder.Eq(s => s.Code, code);
var cursor = await _db.Urls.FindAsync(filter);
return await cursor.FirstOrDefaultAsync();
}
,对解决方案的简要解释将是一个巨大的帮助......
提前致谢。
答案 0 :(得分:1)
dt[,parser_val:=sub(".*?[._](.*)[._].*|.*","\\1",f1)]
dt
f1 parser_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44
如果您想使用regmatches
,可以pattern="(?<=[._]).*(?=[._])|$"
使用perl=TRUE
dt[,parser_val:=regmatches(dt$f1,regexpr("(?<=[._]).*(?=[._])|$",dt$f1,perl = T))]
> dt
f1 parser_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44