如何将大熊猫数据转换为文本数据

时间:2020-10-28 09:54:41

标签: python pandas nlp

我想建立一个简单的拼写校正系统,并且我有一个这样的数据仓库:

incorrect_word, correct_word   
scoohl,school  
watn,want  
frienf,friend

“我想去学校”
我想通过用“ correct_word”列中的正确示例替换“ incorrect_word”列中的不正确示例来更正此句子(如果存在) 我该怎么办?
我编写的示例代码无法正常工作。


text = " شما رفتین مدرسه شون گفتین دستاشون رو بشورن"
# if "دستاشون رو" in text:
#     print("yes")
from hazm import *
import pandas as pd
from src.config.config import *

# letters = word_tokenize(text)
# for text in word_tokenize(text):
#     print(text)
df = pd.read_excel(FILL_DATA).astype(str)
text = str(text)
for idx, item in enumerate(df['informal']):
    if item in text:

        text = text.replace(item, df['formal1'].iloc[idx])
        # item = item.replace(df['informal'].iloc[idx], df['formal1'].iloc[idx])
print(text)

2 个答案:

答案 0 :(得分:1)

我会这样:

df = pd.DataFrame([['scoohl','school'], ['watn','want'], ['frienf','friend']], columns=['incorrect_word', 'correct_word'])
df.index = df['incorrect_word']
df.drop(columns=['incorrect_word'], inplace=True)

text_to_correct = "I watn to go scoohl"

words = text_to_correct.split(' ')

for c, w in enumerate(words):
    if w in df.index:
        words[c] = df.at[w,'correct_word']

words = ' '.join(words)
words

结果:

'I want to go school'

答案 1 :(得分:0)

您好,这是非常基本的python,您可以通过这种方式完成

df['incorrect']=[x for x in df['Correct'] if len(x)>2]

您应该搜索有关lambda,列表理解,申请和映射

谢谢。