mycorpus <-tm_map(mycorpus,content_transformer(remove_url)) 警告信息: 在tm_map.SimpleCorpus(mycorpus,content_transformer(remove_url))中: 转换删除文档
mycorpus <-tm_map(mycorpus,removePunctuation) 警告信息: 在tm_map.SimpleCorpus(mycorpus,removePunctuation)中: 转换删除文档
而且,当我尝试查看一些包含任何符号的推文时: nchar(输出)错误:无效的多字节字符串,元素1
mycorpus <-tm_map(mycorpus,content_transformer(tolower)) FUN(content(x),...)中的错误: 输入无效
答案 0 :(得分:2)
答案 1 :(得分:0)
x = c("https://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r"
, "http://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r")
gsub("\\W|\\d|http\\w?", " ", x, perl = T)
# [1] " stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r"
# [2] " stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r"
the same task for a data frame of 100000 rows
# make sure that your strings are not factors
df = data.frame(id = 1:1e5, url = rep(x, 1e5/2), stringsAsFactors = FALSE)
# df before replacement
df[1:4, ]
# id url
# 1 1 https://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r
# 2 2 http://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r
# 3 3 https://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r
# 4 4 http://stackoverflow.com/questions/51582369/how-can-i-remove-punctuations-and-numbers-in-text-from-data-frame-file-in-r
# apply replacement on a specific column and assign result back to this column
df$url = gsub("\\W|\\d|http\\w?", " ", df$url, perl = T)
# check output
df[1:4, ]
# id url
# 1 1 stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r
# 2 2 stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r
# 3 3 stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r
# 4 4 stackoverflow com questions how can i remove punctuations and numbers in text from data frame file in r