我有2个文本文件:
1)cities.txt
San Francisco
Los Angeles
Seattle
Dallas
2)master.txt
Atlanta is chill and laid-back.
I love Los Angeles.
Coming to Dallas was the right choice.
New York is so busy!
San Francisco is fun.
Moving to Boston soon!
Go to Seattle in the summer.
尝试获取output.txt
<main><beg>I love</beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to</beg><key>Dallas</key><end>was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end>is fun</end></main>
<main><beg>Go to</beg><key>Seattle</key><end>in the summer</end></main>
cities.txt中的每个实体都是&lt;键取代。 master.txt文件更长,没有特定城市的所有行都应该被忽略。它们不合适。输出打印出&lt;中的城市。密钥&GT;和&lt;求&GT; &安培; &LT;端&GT;上下文(如果有的话)。
这就是我所拥有的:
with open(master.txt) as f:
master = f.read()
working = []
with open(cities.txt) as f:
for i in (word.strip() for word in f):
if i in master:
print "<key>", i, "</key>"
我知道如何检查两个文本文件(在'master'中找到'city')...但是在我找到这个城市之后我会在master.txt中打印和上下文的部分!
答案 0 :(得分:1)
以下应该可以帮助您满足您的需求。这适用于Python2和Python3。
#!/usr/bin/python
import os
def parse(line, city):
start = line.find(city)
end = start + len(city)
# Following is a simple implementation. I haven't parsed for spaces
# and punctuations around tags.
return '<main><beg>' + line[:start] + '</beg><key>' + city + '</key><end>' \
+ line[end:] + '</end></main>'
master = [line.strip() for line in open(os.getcwd() + '/master.txt', 'r')]
cities = [line.strip() for line in open(os.getcwd() + '/cities.txt', 'r')]
data = []
for line in master:
for city in cities:
if city in line:
data.append(parse(line, city))
# Following would overwrite output.txt file in the current working directory
with open(os.getcwd() + '/output.txt', 'w') as foo:
for output in data:
foo.write(output + '\n')
答案 1 :(得分:1)
这应该也适用,睾丸使用python 2.6:
cities_dict = {}
with open('master.txt', 'r') as master_in:
with open('cities.txt') as city_in:
for city in city_in:
cities_dict[city.strip()] = '</beg><key>'+city.strip()+'</key><end>'
for line in master_in:
for key,val in cities_dict.iteritems():
if key in line:
line_out= '<main><beg>'+line.replace(key,val).replace('!','.').replace('.','').strip('\n')+'</end></main>'
print line_out
输出:
<main><beg>I love </beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to </beg><key>Dallas</key><end> was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end> is fun</end></main>
<main><beg>Go to </beg><key>Seattle</key><end> in the summer</end></main>