Question

我想建立一个简单的拼写校正系统，并且我有一个这样的数据仓库：

incorrect_word, correct_word   
scoohl,school  
watn,want  
frienf,friend

“我想去学校”
我想通过用“ correct_word”列中的正确示例替换“ incorrect_word”列中的不正确示例来更正此句子（如果存在）我该怎么办？
我编写的示例代码无法正常工作。

text = " شما رفتین مدرسه شون گفتین دستاشون رو بشورن"
# if "دستاشون رو" in text:
#     print("yes")
from hazm import *
import pandas as pd
from src.config.config import *

# letters = word_tokenize(text)
# for text in word_tokenize(text):
#     print(text)
df = pd.read_excel(FILL_DATA).astype(str)
text = str(text)
for idx, item in enumerate(df['informal']):
    if item in text:

        text = text.replace(item, df['formal1'].iloc[idx])
        # item = item.replace(df['informal'].iloc[idx], df['formal1'].iloc[idx])
print(text)

Answer 1

我会这样：

df = pd.DataFrame([['scoohl','school'], ['watn','want'], ['frienf','friend']], columns=['incorrect_word', 'correct_word'])
df.index = df['incorrect_word']
df.drop(columns=['incorrect_word'], inplace=True)

text_to_correct = "I watn to go scoohl"

words = text_to_correct.split(' ')

for c, w in enumerate(words):
    if w in df.index:
        words[c] = df.at[w,'correct_word']

words = ' '.join(words)
words

结果：

'I want to go school'

Answer 2

您好，这是非常基本的python，您可以通过这种方式完成

df['incorrect']=[x for x in df['Correct'] if len(x)>2]

您应该搜索有关lambda，列表理解，申请和映射

谢谢。

如何将大熊猫数据转换为文本数据

2 个答案: