使用TextBlob库可以通过首先将它们定义为TextBlob对象然后使用org.gradle.jvmargs=-Xmx4g -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -Dfile.encoding=UTF-8
方法来改进字符串的拼写。
示例:
correct
是否可以对Pandas DataFrame系列中的字符串执行此操作,例如:
from textblob import TextBlob
data = TextBlob('Two raods diverrged in a yullow waod and surry I culd not travl bouth')
print (data.correct())
Two roads diverged in a yellow wood and sorry I could not travel both
要归还:
data = [{'one': '3', 'two': 'two raods'},
{'one': '7', 'two': 'diverrged in a yullow'},
{'one': '8', 'two': 'waod and surry I'},
{'one': '9', 'two': 'culd not travl bouth'}]
df = pd.DataFrame(data)
df
one two
0 3 Two raods
1 7 diverrged in a yullow
2 8 waod and surry I
3 9 culd not travl bouth
使用TextBlob或其他方法。
答案 0 :(得分:2)
您可以执行以下操作:
for (j = 0; j < arr.length; j++) {
map_helper[i][j] = new MapHelper(i, j, r2[j]);
}
答案 1 :(得分:1)
我仍在寻找更快的方法。但是,我认为python中有一个名为autocorrect
的库,可以帮助进行拼写修正。我在演示数据上计算了两个库(autocorrect
和testblob
),这些是我得到的结果。
%%timeit
spell_correct_tb(['haave', 'naame'])
The slowest run took 4.36 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 505 µs per loop
%%timeit
spell_correct_autocorrect(['haave', 'naame'])
The slowest run took 4.80 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 303 µs per loop
这表明autocorrect
工作得更快(或者我的假设是错误的?)。但是,我不太确定两个图书馆的准确度。
注意:您可以使用pip
运行命令pip install autocorrect