我正在拆分用逗号分隔的字符串,但是,我想忽略引号之间的逗号。这是一个例子:
library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
"DUSA,DUSA Pharmaceuticals Inc,Q"))
#1 USATW,"USA Technologies, Inc Warrants",Q
#2 DUSA,DUSA Pharmaceuticals Inc,Q
setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
tstrsplit(str, ",", fixed=TRUE)]
# Symbol Security Name Market Category
#1 USATW "USA Technologies Inc Warrants"
#2 DUSA DUSA Pharmaceuticals Inc Q
第一个字符串应为:
#1 USATW "USA Technologies, Inc Warrants" Q
有类似的帖子,但在其他编程语言中。
答案 0 :(得分:5)
试试read.table
。不需要包裹。
read.table(text = as.character(dataset$str), sep = ",", as.is = TRUE,
col.names = c("Symbol", "Security Name", "Market Category"), check.names = FALSE)
,并提供:
Symbol Security Name Market Category
1 USATW USA Technologies, Inc Warrants Q
2 DUSA DUSA Pharmaceuticals Inc Q
答案 1 :(得分:3)
this regex将以逗号分隔并保留引号
library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
"DUSA,DUSA Pharmaceuticals Inc,Q"))
setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
tstrsplit(str, '(,)(?=(?:[^"]|"[^"]*")*$)', perl = TRUE)]
# str Symbol Security Name Market Category
# 1: USATW,"USA Technologies, Inc Warrants",Q USATW "USA Technologies, Inc Warrants" Q
# 2: DUSA,DUSA Pharmaceuticals Inc,Q DUSA DUSA Pharmaceuticals Inc Q