我有一个数据帧df,如下所示:
id colour response
1 blue curent
2 red loaning
3 yellow current
4 green loan
5 red currret
6 green loan
您可以看到响应列中的值不统一,我希望能够捕捉到一组标准化的响应。
我还有一个类似
的验证列表validate
validate
current
loan
transfer
我想根据验证列表中条目中的前三个字符标准化df中的响应列
所以最终的输出看起来像是:
id colour response
1 blue current
2 red loan
3 yellow current
4 green loan
5 red current
6 green loan
试图使用fnmatch
pattern = 'cur*'
fnmatch.filter(df, pattern) = 'current'
但无法更改df中的值。
如果有人可以提供帮助,我们将不胜感激
由于
答案 0 :(得分:2)
您可以使用map
In [3664]: mapping = dict(zip(s.str[:3], s))
In [3665]: df.response.str[:3].map(mapping)
Out[3665]:
0 current
1 loan
2 current
3 loan
4 current
5 loan
Name: response, dtype: object
In [3666]: df['response2'] = df.response.str[:3].map(mapping)
In [3667]: df
Out[3667]:
id colour response response2
0 1 blue curent current
1 2 red loaning loan
2 3 yellow current current
3 4 green loan loan
4 5 red currret current
5 6 green loan loan
s
是一系列验证值。
In [3650]: s
Out[3650]:
0 current
1 loan
2 transfer
Name: validate, dtype: object
详细
In [3652]: mapping
Out[3652]: {'cur': 'current', 'loa': 'loan', 'tra': 'transfer'}
mapping
也可以是系列
In [3678]: pd.Series(s.str[:3].values, index=s.values)
Out[3678]:
current cur
loan loa
transfer tra
dtype: object
答案 1 :(得分:0)
模糊匹配?
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df
Out[867]:
id colour response response2
0 1 blue curent current
1 2 red loaning loan
2 3 yellow current current
3 4 green loan loan
4 5 red currret current
5 6 green loan loan