我有一些包含拼写错误的数据。例如:
# Define the correct spellings:
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
# Define the data that contains spelling errors:
B = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])}
df_B = pd.DataFrame(B)
我正在尝试使用以下代码更正它们:
import pandas as pd
import difflib
# Define the function that corrects the spelling:
def Spelling(ask):
difflib.get_close_matches(ask, Li_A, n=1, cutoff=0.5)
# Apply the function that corrects the spelling:
for index,row in df_B.iterrows():
df_B.loc[index,'Correct one'] = Spelling(df_B['one'])
for index,row in df_B.iterrows():
df_B.loc[index,'Correct two'] = Spelling(df_B['two'])
df_B
但我得到的只是:
one two Correct one Correct two
a potat0 po1ato NaN NaN
b toma3o 2omato NaN NaN
c s5uash squ0sh NaN NaN
d ap8le 2pple NaN NaN
e pea7 p3ar NaN NaN
如何在我的数据框架上添加正确的拼写作为新列添加到目前所说的" Nan"请?
当我一次只运行一个单词时,它确实有效:
import difflib
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
B = 'potat0'
C = difflib.get_close_matches(B, Li_A, n=1, cutoff=0.5)
C
Out: ['potato']
答案 0 :(得分:2)
您忘了功能(([\s\S]*?)(production|public))(?P<app>\2)\g
和return
使用iterrows
每个循环的选择值,row
只使用一次:
iterrows
但更简单的是使用applymap
:
def Spelling(ask):
return difflib.get_close_matches(ask, Li_A, n=1, cutoff=0.5)
# Apply the function that corrects the spelling:
for index,row in df_B.iterrows():
df_B.loc[index,'Correct one'] = Spelling(row['one'])
df_B.loc[index,'Correct two'] = Spelling(row['two'])
print (df_B)
one two Correct one Correct two
a potat0 po1ato [potato] [potato]
b toma3o 2omato [tomato] [tomato]
c s5uash squ0sh [squash] [squash]
d ap8le 2pple [apple] [apple]
e pea7 p3ar [pear] [pear]