删除\"从一个字符串

时间:2015-09-18 06:35:20

标签: regex r

我在从文本中删除\"时遇到问题。

以下是我拥有的数据示例:

Date          Text
15/03/2015    \"My name is Jane. I \" am a girl.
20/03/2015    Hi, \"I am bored\". Are you\"?

我想获得此输出(通过删除\"):

Date          Text
15/03/2015    My name is Jane. I am a girl.
20/03/2015    Hi, I am bored. Are you?

以下是我尝试过的代码之一:

text <- c(" \"My name is Jane. I \" am a girl.",
          "Hi, \"I am bored\". Are you\"? ")
text <- gsub ("[^[:alum:][:space:]?|.|,]", "", text, perl = TRUE)

cname <- file.path ("~", "Desktop", "Demo", "Corpus")
length(dor(cname))
dir(cname)
a <- Corpus (DirSoure(cname))
test <- DocumentTermMatrix (a)
findFreqTerms(helo)

我得到的输出是:

[1]\"My   
[2]name
[3]is
[4]Jane
[5]I
[6]\"
[7]am
[8]a
[9]girl.
[10]Hi,
[11]\"I   
[12]am
[13]bored\".
[14]Are
[15]you\"?

2 个答案:

答案 0 :(得分:4)

你需要逃避反斜杠和引用。也许试试这个,

text <- c(" \"My name is Jane. I \" am a girl.",
          "Hi, \"I am bored\". Are you\"? ")
output <- gsub("\\\"","",text)
output
[1] " My name is Jane. I  am a girl." "Hi, I am bored. Are you? " 

答案 1 :(得分:2)

text <- c(" \"My name is Jane. I \" am a girl.",
"Hi, \"I am bored\". Are you\"? ")

step1 = gsub('"','', text, fixed = TRUE)