我有两个带空格分隔的电子邮件地址的文本文件 - newalias.txt和origalias.txt。基本上这些是我想要合并在一起的电子邮件别名映射,但在第一个索引中有重复。我想在newalias.txt的第一个索引中使用匹配的行,并将dup放在origalias.txt中。另外,删除完全重复。
OrigAlias:
sam@example.com sam.smith@example.root.org
jane@example.com jane.maiden@example.root.org
bob@example.com robert.johnson@example.root.org
NewAlias:
sam@example.com samuel.smith@example.root.org
jane@example.com jane.married@example.root.org
bob@example.com robert.johnson@example.root.org
Results:
sam@example.com samuel.smith@example.root.org
jane@example.com jane.married@example.root.org
bob@example.com robert.johnson@example.root.org
我最近一直在学习Python,我做了一些有趣的事情,但文本解析对我来说仍然是一个挑战。任何帮助都会非常感激,即使只是指出我正确的方向。我仍然熟悉Python中的选项。
编辑:
我没想到会有这么快的反应,所以我自己解决了这个问题一段时间后想出来了:
# Py 3.4.1
# Instructions:
# Rename current domain mapping export to dmapsOrig.txt
# Rename whitespace delimited customer modifications file to dmapsNew.txt
# Place the two text files and this script in the same directory
# Run the script: 'python dmapsMerge.py'
from datetime import date
OrigDict = {} # Create empty dictionaries for processing
NewAddDict = {} #
ResultsDict = {} #
with open('dmapsOrig.txt', 'r') as file1: # Populate OrigDict dictionary from dmapsOrig.txt file
for x in file1:
if not x.startswith("#"): # Ignore commented lines
dmaps = x.split()
OrigDict[(dmaps[0])] = ''.join(dmaps[1])
with open('dmapsNew.txt', 'r') as file2: # Populate NewAddDict dictionary from dmapsNew.txt file
for y in file2:
if not y.startswith("#"): # Ignore commented lines
newdmaps = y.split()
NewAddDict[(newdmaps[0])] = ''.join(newdmaps[1])
with open('dmapsOrig-formatted-%s.txt' % date.today(), 'wt') as file3:
file3.write('## Generated on %s' % date.today() + '\n') # Insert date stamp
for alias in sorted(OrigDict.keys()):
file3.write(alias + ' ' + OrigDict[alias] + '\n') # Format original input and write to sorted file
ResultsDict = OrigDict.copy() # Copy OrigDict dictionary keys and values to ResultsDict Dictionary
ResultsDict.update(NewAddDict) # Merge new dmaps into original
with open('dmapsResults-%s.txt' % date.today(), 'wt') as file4:
file4.write('## Generated on %s' % date.today() + '\n') # Insert date stamp
for alias in sorted(ResultsDict.keys()):
file4.write(alias + ' ' + ResultsDict[alias] + '\n') # Format dictionary output and write to results.txt file
file1.close() # Close open files
file2.close() #
file3.close() #
file4.close() #
答案 0 :(得分:1)
with open('origalias.txt') as forig, open('newalias.txt') as fnew, open('results.txt', 'w') as fresult:
dd = {}
for fn in (forig, fnew): # first pass will load with original, then overwrite with new
for ln in fn:
alias, address = ln.split(' ')
dd[alias] = address
# just write out all element in dictionary
for alias, address in dd.iteritems():
fresult.write('%s %s\n' % (alias, address))
答案 1 :(得分:1)
假设您的文件不是太大,最简单的解决方案是在内存中加载origalias.txt
,然后加载newalias.txt
(必要时更新现有条目),并转储合并后的数据。
aliases = {}
with open("origalias.txt") as f:
for line in f:
key, val = line.strip().split(" ")
aliases[key] = val
with open("newalias.txt") as f:
for line in f:
key, val = line.strip().split(" ")
aliases[key] = val
with open("mergedalias.txt", "w") as f:
for key, val in aliases.items():
f.write("{} {}\n".format(key, val))
上述代码的几个关键:
aliases
可以防止重复,因为为键设置新值会替换旧值。for
一起使用),每次迭代都适用于一行,这在您的方案中很方便。.strip()
删除前导和尾随空格;然后.split(“”)根据空格剪切字符串,这两个组件分别受key
和val
的影响。key, val
的影响将引发异常。请考虑使用.split(" ", 1)
来表示更宽容的行为。希望这有帮助。
答案 2 :(得分:1)
# construct a dictionary from orig file
original_dict = dict([tuple(i.split(' ')) for i in open('origalias.txt')])
# create a new dictionary and update the original dictionary(this overwrite new values for same key)
original_dict.update(dict([tuple(i.split(' ')) for i in open('newalias.txt')])))
# now write to new file if you want
fp = open('newfile','w')
for key, value in original_dict.iteritems():
fp.write('%s %s\n'%(key, value))