Question

嗨，

基本上我想做的是，基于城市和州代码运行模糊匹配，并将返回值分配给数据框中的新列集。下面我评论了我的尝试。几次尝试我得到警告，而另一次我得到错误和警告。

我知道这只是一个警告。但是我想知道如何解决它。

def fuzzymatch_get_ratio(row):
    city, state_code = row[['City', 'State Code']]
    print('City = ', city, 'State code = ',state_code)
    cities = df_uszips[df_uszips.state_id==state_code]['city'].str.lower().unique()    
    print(process.extractOne(city, cities, scorer=fuzz.ratio))
    return process.extractOne(city, cities, scorer=fuzz.ratio)


# Warn
#test['new_city_name'], test['score'] = zip(*test.loc[:, ['City', 'State Code']].apply(fuzzymatch_get_ratio, axis=1))
#test.loc[:, 'new_city_name'], test.loc[:, 'score'] = zip(*test.loc[:, ['City', 'State Code']].apply(fuzzymatch_get_ratio, axis=1))

# Warn & object of type 'zip' has no len()
#test[['new_city_name', 'score']] = zip(*test.apply(fuzzymatch_get_ratio, axis=1))

警告消息是

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.

谢谢

Answer 1

您可以从函数中返回pd.Series，创建新的DataFrame并加入原始文本：

def fuzzymatch_get_ratio(row):
        city, state_code = row['City'], row['State Code']
        print('City = ', city, 'State code = ',state_code)
        cities = df_uszips.loc[df_uszips.state_id==state_code, 'city'].str.lower().unique()    
        print(process.extractOne(city, cities, scorer=fuzz.ratio))
        return pd.Series(process.extractOne(city, cities, scorer=fuzz.ratio))

test1 = test.apply(fuzzymatch_get_ratio, axis=1)
test1.columns = ['new_city_name','score']
test = test.join(test1)

另一个解决方案应该是：

test[['new_city_name', 'score']] = test.apply(fuzzymatch_get_ratio, axis=1)

将值分配给数据帧中的新列时出现错误“试图在数据帧的切片副本上设置一个值”

1 个答案: