将不同长度的字符串列表组合到数据帧

时间:2018-11-30 01:26:21

标签: r text mining hunspell

我有一个文本数据,需要纠正英语错误。

我想要一个表的输出,第一列是错误,第二列是所有纠正建议。

例如:

sentence <- "This is a word but thhis isn't and this onne as well. I need hellp"

library(hunspell)
mistakesList <- hunspell(essay)[[1]]
suggestionsList <- hunspell_suggest(mistakesList)

我尝试过

do.call(rbind, Map(data.frame, A=mistakesList, B=suggestionsList))

但返回

            A      B
thhis   thhis   this
onne.1   onne   none
onne.2   onne    one
onne.3   onne  tonne
onne.4   onne  Donne
onne.5   onne   once
onne.6   onne   Anne
onne.7   onne Yvonne
hellp.1 hellp  hello
hellp.2 hellp   hell
hellp.3 hellp   help
hellp.4 hellp hell p

我想要一个返回的数据框:

mistakes suggestions
thhis   this
onne    none one tonne Donne once Anne Yvonne
hellp   hello hell help hell p

2 个答案:

答案 0 :(得分:1)

我们可以保持mistakesList不变,并使用suggestionsListtoString转换为逗号分隔的值。

data.frame(mistakes = mistakesList, suggestions = sapply(suggestionsList, toString))


#  mistakes                               suggestions
#1    thhis                                      this
#2     onne none, one, tonne, Donne, once, Anne, neon
#3    hellp                 hello, hell, help, hell p

答案 1 :(得分:0)

这有效:

  X1 <- do.call(rbind, Map(data.frame, mistakes = mistakesList, suggestions = suggestionsList))
  X1 

library(plyr)

  X2 <- ddply(X1, .(mistakes),summarize,
              suggestions = paste(suggestions, collapse=", "))
  X2


mistakes                                 suggestions
1 thhis                                        this
2  onne none, one, tonne, Donne, once, Anne, Yvonne
3 hellp                   hello, hell, help, hell p