我想建立一个简单的拼写校正系统,并且我有一个这样的数据仓库:
incorrect_word, correct_word
scoohl,school
watn,want
frienf,friend
“我想去学校”
我想通过用“ correct_word”列中的正确示例替换“ incorrect_word”列中的不正确示例来更正此句子(如果存在)
我该怎么办?
我编写的示例代码无法正常工作。
text = " شما رفتین مدرسه شون گفتین دستاشون رو بشورن"
# if "دستاشون رو" in text:
# print("yes")
from hazm import *
import pandas as pd
from src.config.config import *
# letters = word_tokenize(text)
# for text in word_tokenize(text):
# print(text)
df = pd.read_excel(FILL_DATA).astype(str)
text = str(text)
for idx, item in enumerate(df['informal']):
if item in text:
text = text.replace(item, df['formal1'].iloc[idx])
# item = item.replace(df['informal'].iloc[idx], df['formal1'].iloc[idx])
print(text)
答案 0 :(得分:1)
我会这样:
df = pd.DataFrame([['scoohl','school'], ['watn','want'], ['frienf','friend']], columns=['incorrect_word', 'correct_word'])
df.index = df['incorrect_word']
df.drop(columns=['incorrect_word'], inplace=True)
text_to_correct = "I watn to go scoohl"
words = text_to_correct.split(' ')
for c, w in enumerate(words):
if w in df.index:
words[c] = df.at[w,'correct_word']
words = ' '.join(words)
words
结果:
'I want to go school'
答案 1 :(得分:0)
您好,这是非常基本的python,您可以通过这种方式完成
df['incorrect']=[x for x in df['Correct'] if len(x)>2]
您应该搜索有关lambda,列表理解,申请和映射
谢谢。