拆分字符对象,问题{

时间:2014-11-28 14:53:01

标签: r

我需要拆分一个名为C的字符对象,它看起来像:

"{TV}{Property}{Furniture}{Car or Van}{Phone}{Computer or Tablet}{Holiday}{None of the above}"

首先,我尝试使用split:

D<-strsplit(C[1], split = "}")

它有效,它让我回报:

[1] "{TV"                 "{Property"           "{Furniture"          "{Car or Van"         "{Phone"              "{Computer or Tablet" "{Holiday"           
[8] "{None of the above"    

但我想摆脱其他的&#34; {&#34;。当我尝试做到这一点虽然R得到&#34;困惑&#34;用大括号

E<-unlist(strsplit(D, split="{")
Error in strsplit(D[[1]], split = "{") : invalid regular expression '{', reason 'Missing '}''        

有什么建议吗?

3 个答案:

答案 0 :(得分:3)

你可以escape即。 (\\{|\\})或使用[{}]

 D <- strsplit(C, "[{}]")[[1]]
 D[nzchar(D)]
 #[1] "TV"                 "Property"           "Furniture"         
 #[4] "Car or Van"         "Phone"              "Computer or Tablet"
 #[7] "Holiday"            "None of the above" 

或者

  strsplit(C, "\\{|}\\{|}")[[1]][-1]
  #[1] "TV"                 "Property"           "Furniture"         
  #[4] "Car or Van"         "Phone"              "Computer or Tablet"
  #[7] "Holiday"            "None of the above" 

或其他选项

  regmatches(C,gregexpr("[^{}]+", C))[[1]]
  #[1] "TV"                 "Property"           "Furniture"         
  #[4] "Car or Van"         "Phone"              "Computer or Tablet"
  #[7] "Holiday"            "None of the above" 

或者

  library(stringr)
  str_extract_all(C, '[^{}]+')[[1]]
  #[1] "TV"                 "Property"           "Furniture"         
  #[4] "Car or Van"         "Phone"              "Computer or Tablet"
  #[7] "Holiday"            "None of the above" 

或者

  library(stringi)
  stri_extract_all_regex(C, '[^{}]+')[[1]]
  #[1] "TV"                 "Property"           "Furniture"         
  #[4] "Car or Van"         "Phone"              "Computer or Tablet"
  #[7] "Holiday"            "None of the above" 

或者

  library(qdap)
  unname(bracketXtract(C, 'curly'))
  #[1] "TV"                 "Property"           "Furniture"         
  #[4] "Car or Van"         "Phone"              "Computer or Tablet"
  #[7] "Holiday"            "None of the above" 

答案 1 :(得分:3)

仅使用strsplit,您可以

strsplit(x, "[{}]+")[[1]][-1]
# [1] "TV"                 "Property"           "Furniture"         
# [4] "Car or Van"         "Phone"              "Computer or Tablet"
# [7] "Holiday"            "None of the above" 

由于strsplit 的算法将匹配项左侧的字符串添加到输出中,然后删除匹配项及其左侧的所有内容,并且字符串以字符开头我们正在分裂,我们只需要删除结果的第一个元素(由[-1]显示)。

答案 2 :(得分:0)

清理数据的另一种解决方案:

gsub("[{}]","",strsplit(C,"\\}\\{")[[1]])

[1] "TV"                 "Property"           "Furniture"          "Car or Van"        
[5] "Phone"              "Computer or Tablet" "Holiday"            "None of the above"