Question

当我运行此代码来编辑我的CSV文件时，即使我的字典中有字符串，也只会替换部分字符串。

import re

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

bottle = "vial jug canteen urn jug33"
transport = "car automobile airplane scooter"

mydict = {}
for word in bottle.split():
    mydict[word] = 'bottle'
for word in transport.split():
    mydict[word] = 'transport'
print(mydict) # test


with open('replacesample.csv','r') as f:
    text=f.read()
    text=replace_all(text,mydict)
    text=re.sub(r'PROD\s(?=[1-9])',r'PROD',text)

with open('file2.csv','w') as w:
    w.write(text)

例如，如果我的strting CSV是这样的：

jug 
canteen 
urn
car
automobile
swag
airplane
jug33

我的最终结局是：

bottle 
bottle 
bottle
transport
transport
swag
transport
bottle33

我该如何解决这个问题？

预期：

bottle 
bottle 
bottle
transport
transport
swag
transport
bottle

Answer 1

您正在使用字典来枚举替换模式。字典以任意顺序返回键和值。

因此，jug - ＆gt; bottle替换发生在之前 jug33 - ＆gt; bottle替换。此替换也适用于部分单词。

解决方案是按照长度的相反顺序对键进行排序，以确保首先替换较长的匹配：

def replace_all(text, dic): for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True): text = text.replace(i, j) return text

演示：

>>> def replace_all(text, dic): ... for i, j in dic.iteritems(): ... text = text.replace(i, j) ... return text ... >>> replace_all('jug33 jug', mydict) 'bottle33 bottle' >>> def replace_all(text, dic): ... for i, j in sorted(dic.iteritems(), key=lambda i: len(i[0]), reverse=True): ... text = text.replace(i, j) ... return text ... >>> replace_all('jug33 jug', mydict) 'bottle bottle'

用CSV替换整个字符串

1 个答案: