嗨,
基本上我想做的是,基于城市和州代码运行模糊匹配,并将返回值分配给数据框中的新列集。下面我评论了我的尝试。几次尝试我得到警告,而另一次我得到错误和警告。
我知道这只是一个警告。但是我想知道如何解决它。
def fuzzymatch_get_ratio(row):
city, state_code = row[['City', 'State Code']]
print('City = ', city, 'State code = ',state_code)
cities = df_uszips[df_uszips.state_id==state_code]['city'].str.lower().unique()
print(process.extractOne(city, cities, scorer=fuzz.ratio))
return process.extractOne(city, cities, scorer=fuzz.ratio)
# Warn
#test['new_city_name'], test['score'] = zip(*test.loc[:, ['City', 'State Code']].apply(fuzzymatch_get_ratio, axis=1))
#test.loc[:, 'new_city_name'], test.loc[:, 'score'] = zip(*test.loc[:, ['City', 'State Code']].apply(fuzzymatch_get_ratio, axis=1))
# Warn & object of type 'zip' has no len()
#test[['new_city_name', 'score']] = zip(*test.apply(fuzzymatch_get_ratio, axis=1))
警告消息是
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
# Remove the CWD from sys.path while we load stuff.
谢谢
答案 0 :(得分:1)
您可以从函数中返回pd.Series
,创建新的DataFrame
并加入原始文本:
def fuzzymatch_get_ratio(row):
city, state_code = row['City'], row['State Code']
print('City = ', city, 'State code = ',state_code)
cities = df_uszips.loc[df_uszips.state_id==state_code, 'city'].str.lower().unique()
print(process.extractOne(city, cities, scorer=fuzz.ratio))
return pd.Series(process.extractOne(city, cities, scorer=fuzz.ratio))
test1 = test.apply(fuzzymatch_get_ratio, axis=1)
test1.columns = ['new_city_name','score']
test = test.join(test1)
另一个解决方案应该是:
test[['new_city_name', 'score']] = test.apply(fuzzymatch_get_ratio, axis=1)