连接后续子串的部分,并删除这些子串

时间:2014-07-24 01:10:15

标签: python regex

我是python中的新手。我有一个文本文件,我需要在()中连接字符串并在concat之后删除。

的text.txt

Car(skoda,benz,bmw,audi)
The above mentioned cars are sedan type and gives long rides efficient
......

Car(Rangerover,Hummer)
SUV cars are used for family time and spacious.

期望的输出

Car(skoda,benz,bmw,audi,Rangerover,Hummer)
The above mentioned cars are sedan type and gives long rides efficient
......
SUV cars are used for family time and spacious.

这里Car应该添加到括号内的第一个Car,然后删除我连接的行。

编码:

f_in=open("text.txt", "r")      
in_lines=f_in.readlines()           
out=[]
for line in in_lines:
    list_values=line.split()       
    for 'Car' in line:
        Car[i]=eval(list_values[i])    
        if Car[i] in line:     
            str(Car+Car[i]) #i m stuck and my overall logic is getting worse'

请帮助我获得所需的输出。由于缺乏经验,我不知道最简单的方法得到这个。答案将不胜感激。

2 个答案:

答案 0 :(得分:1)

整蛊替换

搜索:

(?s)^(Car\([^),]+(,)[^)]*)(?=.*?Car\(([^)]+)\))|(?!^)Car\([^)]*\)[\r\n]*

替换:

\1\2\3

the Regex Demo 中,请参阅底部的替换。

如果有两个以上的Car定义,请运行此替换,直到结果字符串与原始字符串相同。

Python代码示例

subject=""
result= // paste your original string
while result != subject:
    subject = result
    result = re.sub(r"(?s)^(Car\([^),]+(,)[^)]*)(?=.*?Car\(([^)]+)\))|(?!^)Car\([^)]*\)[\r\n]*",
                    r"\1\2\3",
                    subject)

答案 1 :(得分:0)

您可以使用re查找所有车辆,然后使用Car写下不包括行的行:

import re
comp = re.compile('([^\(]*)\)')
with open("in.txt") as f, open("amended.txt","w") as f1:
    lines = f.read() # read line into one string
    cars = re.findall(comp,lines) # find all cars
    joined = " ".join([" ".join(x.split(",")) for x in cars]) # join all cars inside one set of parens
    f1.write("Car({})\n".format(joined)) # write cars to first line
    f.seek(0) # go back to start
    for line in f:
        if "Car(" not in line: # ignore lines with Car(....
            f1.write("{}".format(line))

输出:

Car(skoda benz bmw audi Rangerover Hummer)
The above mentioned cars are sedan type and gives long rides efficient
SUV cars are used for family time and spacious.