我在从文本中删除\"
时遇到问题。
以下是我拥有的数据示例:
Date Text
15/03/2015 \"My name is Jane. I \" am a girl.
20/03/2015 Hi, \"I am bored\". Are you\"?
我想获得此输出(通过删除\"
):
Date Text
15/03/2015 My name is Jane. I am a girl.
20/03/2015 Hi, I am bored. Are you?
以下是我尝试过的代码之一:
text <- c(" \"My name is Jane. I \" am a girl.",
"Hi, \"I am bored\". Are you\"? ")
text <- gsub ("[^[:alum:][:space:]?|.|,]", "", text, perl = TRUE)
cname <- file.path ("~", "Desktop", "Demo", "Corpus")
length(dor(cname))
dir(cname)
a <- Corpus (DirSoure(cname))
test <- DocumentTermMatrix (a)
findFreqTerms(helo)
我得到的输出是:
[1]\"My
[2]name
[3]is
[4]Jane
[5]I
[6]\"
[7]am
[8]a
[9]girl.
[10]Hi,
[11]\"I
[12]am
[13]bored\".
[14]Are
[15]you\"?
答案 0 :(得分:4)
你需要逃避反斜杠和引用。也许试试这个,
text <- c(" \"My name is Jane. I \" am a girl.",
"Hi, \"I am bored\". Are you\"? ")
output <- gsub("\\\"","",text)
output
[1] " My name is Jane. I am a girl." "Hi, I am bored. Are you? "
答案 1 :(得分:2)
text <- c(" \"My name is Jane. I \" am a girl.",
"Hi, \"I am bored\". Are you\"? ")
step1 = gsub('"','', text, fixed = TRUE)