当我运行此代码来编辑我的CSV文件时,即使我的字典中有字符串,也只会替换部分字符串。
import re
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
bottle = "vial jug canteen urn jug33"
transport = "car automobile airplane scooter"
mydict = {}
for word in bottle.split():
mydict[word] = 'bottle'
for word in transport.split():
mydict[word] = 'transport'
print(mydict) # test
with open('replacesample.csv','r') as f:
text=f.read()
text=replace_all(text,mydict)
text=re.sub(r'PROD\s(?=[1-9])',r'PROD',text)
with open('file2.csv','w') as w:
w.write(text)
例如,如果我的strting CSV是这样的:
jug
canteen
urn
car
automobile
swag
airplane
jug33
我的最终结局是:
bottle
bottle
bottle
transport
transport
swag
transport
bottle33
我该如何解决这个问题?
预期:
bottle
bottle
bottle
transport
transport
swag
transport
bottle
答案 0 :(得分:0)
您正在使用字典来枚举替换模式。字典以任意顺序返回键和值。
因此,jug
- > bottle
替换发生在之前 jug33
- > bottle
替换。此替换也适用于部分单词。
解决方案是按照长度的相反顺序对键进行排序,以确保首先替换较长的匹配:
def replace_all(text, dic):
for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True):
text = text.replace(i, j)
return text
演示:
>>> def replace_all(text, dic):
... for i, j in dic.iteritems():
... text = text.replace(i, j)
... return text
...
>>> replace_all('jug33 jug', mydict)
'bottle33 bottle'
>>> def replace_all(text, dic):
... for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True):
... text = text.replace(i, j)
... return text
...
>>> replace_all('jug33 jug', mydict)
'bottle bottle'