我有一个文本数据,需要纠正英语错误。
我想要一个表的输出,第一列是错误,第二列是所有纠正建议。
例如:
sentence <- "This is a word but thhis isn't and this onne as well. I need hellp"
library(hunspell)
mistakesList <- hunspell(essay)[[1]]
suggestionsList <- hunspell_suggest(mistakesList)
我尝试过
do.call(rbind, Map(data.frame, A=mistakesList, B=suggestionsList))
但返回
A B
thhis thhis this
onne.1 onne none
onne.2 onne one
onne.3 onne tonne
onne.4 onne Donne
onne.5 onne once
onne.6 onne Anne
onne.7 onne Yvonne
hellp.1 hellp hello
hellp.2 hellp hell
hellp.3 hellp help
hellp.4 hellp hell p
我想要一个返回的数据框:
mistakes suggestions
thhis this
onne none one tonne Donne once Anne Yvonne
hellp hello hell help hell p
答案 0 :(得分:1)
我们可以保持mistakesList
不变,并使用suggestionsList
将toString
转换为逗号分隔的值。
data.frame(mistakes = mistakesList, suggestions = sapply(suggestionsList, toString))
# mistakes suggestions
#1 thhis this
#2 onne none, one, tonne, Donne, once, Anne, neon
#3 hellp hello, hell, help, hell p
答案 1 :(得分:0)
这有效:
X1 <- do.call(rbind, Map(data.frame, mistakes = mistakesList, suggestions = suggestionsList))
X1
library(plyr)
X2 <- ddply(X1, .(mistakes),summarize,
suggestions = paste(suggestions, collapse=", "))
X2
mistakes suggestions
1 thhis this
2 onne none, one, tonne, Donne, once, Anne, Yvonne
3 hellp hello, hell, help, hell p