替换文件中的某些表达式但只有一次

时间:2014-12-15 12:19:45

标签: python

我从文件中提取了一些表达式,我想在同一个文件中插入这些表达式,但格式不同,如括号之间。我的问题是我希望每个表达式只替换一个。 该文件看起来像这样

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

,表达式就像这样

ex = """ good, clever, beautiful, bad,"""

使用的代码是

adj =  ex.split(",") 
for a in adj:
  if a in file:
     file = file.replace(a, ' ' +'[[' + a + ']]')
print file

这给出了以下输出:

he is a  [[good]] man [[
]]she is a [[ beautiful]] woman [[
]]this is a [[ clever]] student [[
]]he is a [[ bad]] neighbour [[
]]they are [[ bad]] men [[
]]She is very [[ beautiful]] [[
]] [[
]]

而预期的输出是

he is a  [[good]] man 
she is a [[ beautiful]] woman 
this is a [[ clever]] student 
he is a [[ bad]] neighbour 
they are bad men # so here "bad" will not be replaced because there is another 'bad' replaced 
She is very beautiful # and here 'beautiful' will not be replaced like 'bad'

3 个答案:

答案 0 :(得分:1)

如果文件内容存储为字符串

字符串的replace方法也接受名为max的第三个可选参数。

http://www.tutorialspoint.com/python/string_replace.htm

这将允许您选择要替换的单词的出现位置。

例如,

>>> "he is a good man, and a good husband".replace('good', '[[ good ]]', 1)
'he is a [[ good ]] man, and a good husband'
>>>

坚持下去,我现在正在研究你的例子。

示例2:从文件中读取,一次一行。

在上面的方法中,我假设您已经读取了文件并将其内容存储为单个字符串。在下面的第二个答案中,我将向您展示如何实现代码来解决您的问题

假设您有一个文件testfile.txt,其中包含以下内容:

he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautifu

这是你的python代码

#!/usr/bin/env python

# your expression 
ex = """ good, clever, beautiful, bad,"""

# list comprehension to clean up your expression, 
# first by spliting it by comma and then remove anything that is just a empty
wanted_terms = [x.strip() for x in ex.split(',') if x.strip() != '']

## read file using with statement
with open('testfile.txt') as f:
    for line in f:
        line = line.strip()
        ## for each wanted terms check if they exist in the line 
        for x in wanted_terms:
            if x in line:
                ## I prefer to use string format here.
                #replacement = "[[ %s ]]" % x 
                #line = line.replace(x, replacement, 1)

                ## if term exist, do replacement. Use max =1 to ensure it replace only the first instance.
                line = line.replace(x, '[[' + x +']]', 1 )
                ## remove it from term list so that in future, it will replace any new occurence
                wanted_terms.remove(x)

让我知道您发现这个有用或者是否还有其他意见,

干杯, Biobirdman

答案 1 :(得分:0)

biobirdman似乎有一个很好的解决方案,所以使用它来做正确的事情。我在这里的帖子只是为了解释出了什么问题。当你这样做时:

ex = """ good, clever, beautiful, bad,"""
adj =  ex.split(",") 

你得到的东西不是你的想法

print adj
[' good', ' clever', ' beautiful', ' bad', '']

我不知道你是不是要在每一根弦之前留一个空格,但你几乎肯定不是说最后有一个''。事实上,我认为你没有这个例子,否则你会得到一个不同的不良行为。我认为你在前面结束时是一个新的线条角色。所以'那显示实际上是你尝试的换行符。

因此,它符合您所期望的所有内容,以及适合您的所有换行符。对于使用您发布的代码的任何人,他们将在每对角色之间获得匹配。

[[]]h [[]]e [[]]  [[]]i [[]]s [[]]  [[]]a  ........

修复:摆脱换行符。消除额外的空间。怎么样?看看条带。

答案 2 :(得分:0)

对代码进行两处更改。将adj替换为word时,避免[[word]]中的空字符串并删除前导空格。 word的值类似于"美丽","聪明"在你的代码中。

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

ex = """ good, clever, beautiful, bad,"""

adj = filter(None, ex.split(","))    # removing empty strings from list
# SO ref: http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings

for a in adj:
    if a in file:
        file = file.replace(a, ' ' +'[[' + a.strip() + ']]')    # strip() removes leading or trailing whitespaces

print file