我有一个字符串列(data.table),我需要根据模式(“-”之间的文本)和该模式的定义(但可变)数量的实例进行解析,我不确定使用正则表达式的方法:
> test <- c("AAA-bb-ccc", "abcd-efgh","blah", "blah-blah-blah-blah")
假设,实例的预定义数量为i。
> i = 1
> output
"AAA" "abcd" "blah" "blah
> i = 2
> output
"bb" "efgh" "" "blah"
> i= 3
> output
"ccc" "" "" "blah"
我将如何使用通用正则表达式来实现这一目标?
答案 0 :(得分:1)
我们可以创建一个在“-”上分割并返回第i个值的函数。
CREATE TABLE dbo.CCTemp
(
ID INT IDENTITY(1,1),
CCName VARCHAR(100),
[Level] INT,
ParentID INT
)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('Services Total - 2018',1,NULL)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9UKDT - UKD Expense Total',2,1)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9Q400 - UKD Indirects',3,2)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98064 - IT SDS Costs',4,3)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98063 - ACS in charges',4,3)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98012 - UKD - Central',4,3)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9Q300 - UKD Non Opex Total',3,2)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98024 - Commission',4,7)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98013 - Affiliates Commission',4,7)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9Q200 - Digital Functions Total',3,2)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9QB41 - Marketing',4,10)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('2018_9QB4F - UKD Marketing General Function',5,11)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98141 - Marketing Stratey, Planning and Supplier Management',6,12)
INSERT INTO DBO.CCTemp (CCName,Level,ParentID) values ('98031 - UKD Cost Challenge (RM)',6,12)
答案 1 :(得分:1)
对于i=3
,您可以尝试
unlist(lapply(strsplit(test,split = '-'),'[',3))
[1] "ccc" NA NA "blah"
答案 2 :(得分:1)
我们还可以使用tokenize_regex
包中的tokenizers
,然后将data.table::transpose
和cbind
相关列放入data.table
test <- c("AAA-bb-ccc", "abcd-efgh","blah", "blah-blah-blah-blah")
library(tokenizers)
library(data.table)
test <- transpose(tokenize_regex(test, "-"), fill = "")
i <- 1:3
as.data.table(do.call(cbind, test[i]))
# V1 V2 V3
#1: AAA bb ccc
#2: abcd efgh
#3: blah
#4: blah blah blah