这是我的数据
Id Keyword
1 ayam e-commerce
2 biaya fuel personal wallet
3 pulsa sms virtualaccount
4 biaya koperasi personal
5 familymart personal
6 e-commerce pln
7 biaya onus
8 koperasi personal
9 biaya familymart personal
10 fuel personal wallet
11 fuel travel
我希望存在fuel
,pln
和ayam
等关键字的每个关键字都缩短为fuel
,pln
或{{ 1}},所以输出将变成这样
ayam
我应该怎么做?
答案 0 :(得分:1)
要只替换第一个匹配的单词,请在循环中使用contains
:
L = ['fuel', 'pln', 'ayam']
for x in L:
df.loc[df['Keyword'].str.contains(x), 'Keyword'] = x
或嵌套列表理解:
L = ['fuel', 'pln', 'ayam']
df['Keyword'] = [next(iter([z for z in L if z in x]), x) for x in df['Keyword']]
L = ['fuel', 'pln', 'ayam']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['Keyword'] = df['Keyword'].str.extract('('+ pat + ')', expand=False).fillna(df['Keyword'])
print (df)
Id Keyword
0 1 ayam
1 2 fuel
2 3 pulsa sms virtualaccount
3 4 biaya koperasi personal
4 5 familymart personal
5 6 pln
6 7 biaya onus
7 8 koperasi personal
8 9 biaya familymart personal
9 10 fuel
10 11 fuel
如果需要所有匹配的值,请将findall
与join
一起使用,并用loc
将非空值替换为原始值:
print (df)
Id Keyword
0 1 ayam e-commerce
1 2 biaya fuel pln wallet <- matched 2 keywords
2 3 pulsa sms virtualaccount
pat = '|'.join(r"\b{}\b".format(x) for x in L)
s = df['Keyword'].str.findall('('+ pat + ')').str.join(', ')
df.loc[s != '', 'Keyword'] = s
print (df)
Id Keyword
0 1 ayam
1 2 fuel, pln
2 3 pulsa sms virtualaccount