Row<-c(1,2,3,4,5)
Content<-c("I love cheese", "whre is the fish", "Final Countdow", "show me your s", "where is what")
Data<-cbind(Row, Content)
View(Data)
我想创建一个函数,告诉我每行有多少单词错误。
中间步骤是让它看起来像这样:
Row<-c(1,2,3,4,5)
Content<-c("I love cheese", "whre is the fs", "Final Countdow", "show me your s", "where is what")
MisspelledWords<-c(NA, "whre, fs", "Countdow","s",NA)
Data<-cbind(Row, Content,MisspelledWords)
我知道我必须使用aspell但是我遇到问题只在行上执行aspell而不是总是直接在整个文件上执行,最后我想计算每行有多少单词错误为此我会采取代码:Count the number of words in a string in R?
答案 0 :(得分:5)
要使用aspell
,您必须使用文件。使用函数将列转储到文件,运行aspell
并获取计数非常简单(但如果你有一个大的矩阵/数据帧,它将不会那么高效。)
countMispelled <- function(words) {
# do a bit of cleanup (if necessary)
words <- gsub(" *", " ", gsub("[[:punct:]]", "", words))
temp_file <- tempfile()
writeLines(words, temp_file);
res <- aspell(temp_file)
unlink(temp_file)
# return # of mispelled words
length(res$Original)
}
Data <- cbind(Data, Errors=unlist(lapply(Data[,2], countMispelled)))
Data
## Row Content Errors
## [1,] "1" "I love cheese" "0"
## [2,] "2" "whre is thed fish" "2"
## [3,] "3" "Final Countdow" "1"
## [4,] "4" "show me your s" "0"
## [5,] "5" "where is what" "0"
您可能最好使用数据框与矩阵(我只使用您提供的内容),因为您可以保持Row
和Errors
这样的数字。
答案 1 :(得分:4)
受到this article的启发,请尝试which_misspelled
中的check_spelling
和library(qdap)
。
library(qdap)
# which_misspelled
n_misspelled <- sapply(Content, function(x){
length(which_misspelled(x, suggest = FALSE))
})
data.frame(Content, n_misspelled, row.names = NULL)
# Content n_misspelled
# 1 I love cheese 0
# 2 whre is the fs 2
# 3 Final Countdow 1
# 4 show me your s 0
# 5 where is what 0
# check_spelling
df <- check_spelling(Content, n.suggest = 0)
n_misspelled <- as.vector(table(factor(df$row, levels = Row)))
data.frame(Content, n_misspelled)
# Content n_misspelled
# 1 I love cheese 0
# 2 whre is the fs 2
# 3 Final Countdow 1
# 4 show me your s 0
# 5 where is what 0