如何拆分但忽略R中带引号的字符串中的分隔符?

时间:2016-04-21 13:39:40

标签: r split data.table

我正在拆分用逗号分隔的字符串,但是,我想忽略引号之间的逗号。这是一个例子:

library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                            "DUSA,DUSA Pharmaceuticals Inc,Q"))

#1   USATW,"USA Technologies, Inc Warrants",Q
#2   DUSA,DUSA Pharmaceuticals Inc,Q

setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                    tstrsplit(str, ",", fixed=TRUE)]


#   Symbol    Security Name               Market Category
#1  USATW    "USA Technologies            Inc Warrants"
#2  DUSA      DUSA Pharmaceuticals Inc    Q

第一个字符串应为:

#1  USATW    "USA Technologies, Inc Warrants"  Q

有类似的帖子,但在其他编程语言中。

2 个答案:

答案 0 :(得分:5)

试试read.table。不需要包裹。

read.table(text = as.character(dataset$str), sep = ",", as.is = TRUE,   
  col.names = c("Symbol", "Security Name", "Market Category"), check.names = FALSE)

,并提供:

  Symbol                  Security Name Market Category
1  USATW USA Technologies, Inc Warrants               Q
2   DUSA       DUSA Pharmaceuticals Inc               Q

答案 1 :(得分:3)

this regex将以逗号分隔并保留引号

library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                            "DUSA,DUSA Pharmaceuticals Inc,Q"))

setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                 tstrsplit(str, '(,)(?=(?:[^"]|"[^"]*")*$)', perl = TRUE)]

#                                         str Symbol                    Security Name Market Category
# 1: USATW,"USA Technologies, Inc Warrants",Q  USATW "USA Technologies, Inc Warrants"               Q
# 2:          DUSA,DUSA Pharmaceuticals Inc,Q   DUSA         DUSA Pharmaceuticals Inc               Q