Python:比较2个文本文件的字符串然后打印上下文

时间:2013-01-31 00:56:11

标签: python string text find

我有2个文本文件:

1)cities.txt

San Francisco
Los Angeles
Seattle
Dallas

2)master.txt

Atlanta is chill and laid-back.
I love Los Angeles.
Coming to Dallas was the right choice.
New York is so busy!
San Francisco is fun.
Moving to Boston soon!
Go to Seattle in the summer.

尝试获取output.txt

<main><beg>I love</beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to</beg><key>Dallas</key><end>was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end>is fun</end></main>
<main><beg>Go to</beg><key>Seattle</key><end>in the summer</end></main>

cities.txt中的每个实体都是&lt;键取代。 master.txt文件更长,没有特定城市的所有行都应该被忽略。它们不合适。输出打印出&lt;中的城市。密钥&GT;和&lt;求&GT; &安培; &LT;端&GT;上下文(如果有的话)。

这就是我所拥有的:

with open(master.txt) as f:
    master = f.read()
working = []
with open(cities.txt) as f:
    for i in (word.strip() for word in f):
        if i in master:
            print "<key>", i, "</key>"

我知道如何检查两个文本文件(在'master'中找到'city')...但是在我找到这个城市之后我会在master.txt中打印和上下文的部分!

2 个答案:

答案 0 :(得分:1)

以下应该可以帮助您满足您的需求。这适用于Python2和Python3。

#!/usr/bin/python

import os

def parse(line, city):
    start = line.find(city)
    end = start + len(city)
    # Following is a simple implementation. I haven't parsed for spaces
    # and punctuations around tags.
    return '<main><beg>' + line[:start] + '</beg><key>' + city + '</key><end>' \
           + line[end:] + '</end></main>'

master = [line.strip() for line in open(os.getcwd() + '/master.txt', 'r')]
cities = [line.strip() for line in open(os.getcwd() + '/cities.txt', 'r')]
data = []

for line in master:
    for city in cities:
        if city in line:
            data.append(parse(line, city))

# Following would overwrite output.txt file in the current working directory
with open(os.getcwd() + '/output.txt', 'w') as foo:
    for output in data:
        foo.write(output + '\n')

答案 1 :(得分:1)

这应该也适用,睾丸使用python 2.6:

cities_dict = {}
with open('master.txt', 'r') as master_in:
    with open('cities.txt') as city_in:
        for city in city_in:
            cities_dict[city.strip()] = '</beg><key>'+city.strip()+'</key><end>'

    for line in master_in:
        for key,val in cities_dict.iteritems():
            if key in line:
                line_out= '<main><beg>'+line.replace(key,val).replace('!','.').replace('.','').strip('\n')+'</end></main>'
                print line_out

输出:

<main><beg>I love </beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to </beg><key>Dallas</key><end> was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end> is fun</end></main>
<main><beg>Go to </beg><key>Seattle</key><end> in the summer</end></main>